201
|
Jaschke PR, Dotson GA, Hung KS, Liu D, Endy D. Definitive demonstration by synthesis of genome annotation completeness. Proc Natl Acad Sci U S A 2019; 116:24206-24213. [PMID: 31719208 PMCID: PMC6883844 DOI: 10.1073/pnas.1905990116] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
We develop a method for completing the genetics of natural living systems by which the absence of expected future discoveries can be established. We demonstrate the method using bacteriophage øX174, the first DNA genome to be sequenced. Like many well-studied natural organisms, closely related genome sequences are available-23 Bullavirinae genomes related to øX174. Using bioinformatic tools, we first identified 315 potential open reading frames (ORFs) within the genome, including the 11 established essential genes and 82 highly conserved ORFs that have no known gene products or assigned functions. Using genome-scale design and synthesis, we made a mutant genome in which all 11 essential genes are simultaneously disrupted, leaving intact only the 82 conserved but cryptic ORFs. The resulting genome is not viable. Cell-free gene expression followed by mass spectrometry revealed only a single peptide expressed from both the cryptic ORF and wild-type genomes, suggesting a potential new gene. A second synthetic genome in which 71 conserved cryptic ORFs were simultaneously disrupted is viable but with ∼50% reduced fitness relative to the wild type. However, rather than finding any new genes, repeated evolutionary adaptation revealed a single point mutation that modulates expression of gene H, a known essential gene, and fully suppresses the fitness defect. Taken together, we conclude that the annotation of currently functional ORFs for the øX174 genome is formally complete. More broadly, we show that sequencing and bioinformatics followed by synthesis-enabled reverse genomics, proteomics, and evolutionary adaptation can definitely establish the sufficiency and completeness of natural genome annotations.
Collapse
Affiliation(s)
- Paul R Jaschke
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia;
| | | | - Kay S Hung
- Bioengineering Department, Stanford University, Stanford, CA 94305
| | - Diane Liu
- Bioengineering Department, Stanford University, Stanford, CA 94305
| | - Drew Endy
- Bioengineering Department, Stanford University, Stanford, CA 94305
| |
Collapse
|
202
|
Willemsen A, Félez-Sánchez M, Bravo IG. Genome Plasticity in Papillomaviruses and De Novo Emergence of E5 Oncogenes. Genome Biol Evol 2019; 11:1602-1617. [PMID: 31076746 PMCID: PMC6557308 DOI: 10.1093/gbe/evz095] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2019] [Indexed: 02/06/2023] Open
Abstract
The clinical presentations of papillomavirus (PV) infections come in many different flavors. While most PVs are part of a healthy skin microbiota and are not associated to physical lesions, other PVs cause benign lesions, and only a handful of PVs are associated to malignant transformations linked to the specific activities of the E5, E6, and E7 oncogenes. The functions and origin of E5 remain to be elucidated. These E5 open reading frames (ORFs) are present in the genomes of a few polyphyletic PV lineages, located between the early and the late viral gene cassettes. We have computationally assessed whether these E5 ORFs have a common origin and whether they display the properties of a genuine gene. Our results suggest that during the evolution of Papillomaviridae, at least four events lead to the presence of a long noncoding DNA stretch between the E2 and the L2 genes. In three of these events, the novel regions evolved coding capacity, becoming the extant E5 ORFs. We then focused on the evolution of the E5 genes in AlphaPVs infecting primates. The sharp match between the type of E5 protein encoded in AlphaPVs and the infection phenotype (cutaneous warts, genital warts, or anogenital cancers) supports the role of E5 in the differential oncogenic potential of these PVs. In our analyses, the best-supported scenario is that the five types of extant E5 proteins within the AlphaPV genomes may not have a common ancestor. However, the chemical similarities between E5s regarding amino acid composition prevent us from confidently rejecting the model of a common origin. Our evolutionary interpretation is that an originally noncoding region entered the genome of the ancestral AlphaPVs. This genetic novelty allowed to explore novel transcription potential, triggering an adaptive radiation that yielded three main viral lineages encoding for different E5 proteins, displaying distinct infection phenotypes. Overall, our results provide an evolutionary scenario for the de novo emergence of viral genes and illustrate the impact of such genotypic novelty in the phenotypic diversity of the viral infections.
Collapse
Affiliation(s)
- Anouk Willemsen
- Laboratory MIVEGEC (UMR CNRS IRD Uni Montpellier), Centre National de la Recherche Scientique (CNRS), Montpellier, France
| | - Marta Félez-Sánchez
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain
| | - Ignacio G Bravo
- Laboratory MIVEGEC (UMR CNRS IRD Uni Montpellier), Centre National de la Recherche Scientique (CNRS), Montpellier, France
| |
Collapse
|
203
|
Keeling DM, Garza P, Nartey CM, Carvunis AR. The meanings of 'function' in biology and the problematic case of de novo gene emergence. eLife 2019; 8:e47014. [PMID: 31674305 PMCID: PMC6824840 DOI: 10.7554/elife.47014] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 10/11/2019] [Indexed: 12/24/2022] Open
Abstract
The word function has many different meanings in molecular biology. Here we explore the use of this word (and derivatives like functional) in research papers about de novo gene birth. Based on an analysis of 20 abstracts we propose a simple lexicon that, we believe, will help scientists and philosophers discuss the meaning of function more clearly.
Collapse
Affiliation(s)
- Diane Marie Keeling
- Department of Communication Studies, College of Arts & SciencesUniversity of San DiegoSan DiegoUnited States
| | | | | | - Anne-Ruxandra Carvunis
- Department of Computational and Systems BiologyUniversity of PittsburghPittsburghUnited States
- Pittsburgh Center for Evolutionary Biology and MedicineUniversity of Pittsburgh School of MedicinePittsburghUnited States
| |
Collapse
|
204
|
Stewart NB, Rogers RL. Chromosomal rearrangements as a source of new gene formation in Drosophila yakuba. PLoS Genet 2019; 15:e1008314. [PMID: 31545792 PMCID: PMC6776367 DOI: 10.1371/journal.pgen.1008314] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 10/03/2019] [Accepted: 07/17/2019] [Indexed: 11/19/2022] Open
Abstract
The origins of new genes are among the most fundamental questions in evolutionary biology. Our understanding of the ways that new genetic material appears and how that genetic material shapes population variation remains incomplete. De novo genes and duplicate genes are a key source of new genetic material on which selection acts. To better understand the origins of these new gene sequences, we explored the ways that structural variation might alter expression patterns and form novel transcripts. We provide evidence that chromosomal rearrangements are a source of novel genetic variation that facilitates the formation of de novo exons in Drosophila. We identify 51 cases of de novo exon formation created by chromosomal rearrangements in 14 strains of D. yakuba. These new genes inherit transcription start signals and open reading frames when the 5' end of existing genes are combined with previously untranscribed regions. Such new genes would appear with novel peptide sequences, without the necessity for secondary transitions from non-coding RNA to protein. This mechanism of new peptide formations contrasts with canonical theory of de novo gene progression requiring non-coding intermediaries that must acquire new mutations prior to loss via pseudogenization. Hence, these mutations offer a means to de novo gene creation and protein sequence formation in a single mutational step, answering a long standing open question concerning new gene formation. We further identify gene expression changes to 134 existing genes, indicating that these mutations can alter gene regulation. Population variability for chromosomal rearrangements is considerable, with 2368 rearrangements observed across 14 inbred lines. More rearrangements were identified on the X chromosome than any of the autosomes, suggesting the X is more susceptible to chromosome alterations. Together, these results suggest that chromosomal rearrangements are a source of variation in populations that is likely to be important to explain genetic and therefore phenotypic diversity.
Collapse
Affiliation(s)
- Nicholas B. Stewart
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
- Department of Biological Sciences, Ft Hays State University, Ft Hays, Kansas, United States of America
| | - Rebekah L. Rogers
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
205
|
Arendsee Z, Li J, Singh U, Bhandary P, Seetharam A, Wurtele ES. fagin: synteny-based phylostratigraphy and finer classification of young genes. BMC Bioinformatics 2019; 20:440. [PMID: 31455236 PMCID: PMC6712868 DOI: 10.1186/s12859-019-3023-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 08/08/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the "Unknown" A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation.
Collapse
Affiliation(s)
- Zebulun Arendsee
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Jing Li
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
| | - Urminder Singh
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Arun Seetharam
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA, 50011, USA
| | - Eve Syrkin Wurtele
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA.
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA.
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
206
|
Talkish J, Igel H, Perriman RJ, Shiue L, Katzman S, Munding EM, Shelansky R, Donohue JP, Ares M. Rapidly evolving protointrons in Saccharomyces genomes revealed by a hungry spliceosome. PLoS Genet 2019; 15:e1008249. [PMID: 31437148 PMCID: PMC6726248 DOI: 10.1371/journal.pgen.1008249] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Revised: 09/04/2019] [Accepted: 06/15/2019] [Indexed: 12/14/2022] Open
Abstract
Introns are a prevalent feature of eukaryotic genomes, yet their origins and contributions to genome function and evolution remain mysterious. In budding yeast, repression of the highly transcribed intron-containing ribosomal protein genes (RPGs) globally increases splicing of non-RPG transcripts through reduced competition for the spliceosome. We show that under these “hungry spliceosome” conditions, splicing occurs at more than 150 previously unannotated locations we call protointrons that do not overlap known introns. Protointrons use a less constrained set of splice sites and branchpoints than standard introns, including in one case AT-AC in place of GT-AG. Protointrons are not conserved in all closely related species, suggesting that most are not under positive selection and are fated to disappear. Some are found in non-coding RNAs (e. g. CUTs and SUTs), where they may contribute to the creation of new genes. Others are found across boundaries between noncoding and coding sequences, or within coding sequences, where they offer pathways to the creation of new protein variants, or new regulatory controls for existing genes. We define protointrons as (1) nonconserved intron-like sequences that are (2) infrequently spliced, and importantly (3) are not currently understood to contribute to gene expression or regulation in the way that standard introns function. A very few protointrons in S. cerevisiae challenge this classification by their increased splicing frequency and potential function, consistent with the proposed evolutionary process of “intronization”, whereby new standard introns are created. This snapshot of intron evolution highlights the important role of the spliceosome in the expansion of transcribed genomic sequence space, providing a pathway for the rare events that may lead to the birth of new eukaryotic genes and the refinement of existing gene function. The protein coding information in eukaryotic genes is broken by intervening sequences called introns that are removed from RNA during transcription by a large protein-RNA complex called the spliceosome. Where introns come from and how the spliceosome contributes to genome evolution are open questions. In this study, we find more than 150 new places in the yeast genome that are recognized by the spliceosome and spliced out as introns. Since they appear to have arisen very recently in evolution by sequence drift and do not appear to contribute to gene expression or its regulation, we call these protointrons. Protointrons are found in both protein-coding and non-coding RNAs and are not efficiently removed by the splicing machinery. Although most protointrons are not conserved and will likely disappear as evolution proceeds, a few are spliced more efficiently, and are located where they might begin to play functional roles in gene expression, as predicted by the proposed process of intronization. The challenge now is to understand how spontaneously appearing splicing events like protointrons might contribute to the creation of new genes, new genetic controls, and new protein isoforms as genomes evolve.
Collapse
Affiliation(s)
- Jason Talkish
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Haller Igel
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Rhonda J. Perriman
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Lily Shiue
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Sol Katzman
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Elizabeth M. Munding
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Robert Shelansky
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - John Paul Donohue
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Manuel Ares
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
- * E-mail:
| |
Collapse
|
207
|
Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae. Sci Rep 2019; 9:12122. [PMID: 31431676 PMCID: PMC6702216 DOI: 10.1038/s41598-019-47797-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/19/2019] [Indexed: 01/19/2023] Open
Abstract
Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in crop and model systems.
Collapse
|
208
|
Witt E, Benjamin S, Svetec N, Zhao L. Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila. eLife 2019; 8:e47138. [PMID: 31418408 PMCID: PMC6697446 DOI: 10.7554/elife.47138] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 07/06/2019] [Indexed: 12/25/2022] Open
Abstract
The testis is a peculiar tissue in many respects. It shows patterns of rapid gene evolution and provides a hotspot for the origination of genetic novelties such as de novo genes, duplications and mutations. To investigate the expression patterns of genetic novelties across cell types, we performed single-cell RNA-sequencing of adult Drosophila testis. We found that new genes were expressed in various cell types, the patterns of which may be influenced by their mode of origination. In particular, lineage-specific de novo genes are commonly expressed in early spermatocytes, while young duplicated genes are often bimodally expressed. Analysis of germline substitutions suggests that spermatogenesis is a highly reparative process, with the mutational load of germ cells decreasing as spermatogenesis progresses. By elucidating the distribution of genetic novelties across spermatogenesis, this study provides a deeper understanding of how the testis maintains its core reproductive function while being a hotbed of evolutionary innovation.
Collapse
Affiliation(s)
- Evan Witt
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| | - Sigi Benjamin
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| | - Nicolas Svetec
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| | - Li Zhao
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| |
Collapse
|
209
|
Fesenko I, Kirov I, Kniazev A, Khazigaleeva R, Lazarev V, Kharlampieva D, Grafskaia E, Zgoda V, Butenko I, Arapidi G, Mamaeva A, Ivanov V, Govorun V. Distinct types of short open reading frames are translated in plant cells. Genome Res 2019; 29:1464-1477. [PMID: 31387879 PMCID: PMC6724668 DOI: 10.1101/gr.253302.119] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 08/01/2019] [Indexed: 02/07/2023]
Abstract
Genomes contain millions of short (<100 codons) open reading frames (sORFs), which are usually dismissed during gene annotation. Nevertheless, peptides encoded by such sORFs can play important biological roles, and their impact on cellular processes has long been underestimated. Here, we analyzed approximately 70,000 transcribed sORFs in the model plant Physcomitrella patens (moss). Several distinct classes of sORFs that differ in terms of their position on transcripts and the level of evolutionary conservation are present in the moss genome. Over 5000 sORFs were conserved in at least one of 10 plant species examined. Mass spectrometry analysis of proteomic and peptidomic data sets suggested that tens of sORFs located on distinct parts of mRNAs and long noncoding RNAs (lncRNAs) are translated, including conserved sORFs. Translational analysis of the sORFs and main ORFs at a single locus suggested the existence of genes that code for multiple proteins and peptides with tissue-specific expression. Functional analysis of four lncRNA-encoded peptides showed that sORFs-encoded peptides are involved in regulation of growth and differentiation in moss. Knocking out lncRNA-encoded peptides resulted in a decrease of moss growth. In contrast, the overexpression of these peptides resulted in a diverse range of phenotypic effects. Our results thus open new avenues for discovering novel, biologically active peptides in the plant kingdom.
Collapse
Affiliation(s)
- Igor Fesenko
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Ilya Kirov
- Laboratory of marker-assisted and genomic selection of plants, All-Russian Research Institute of Agricultural Biotechnology, 127550 Moscow, Russian Federation
| | - Andrey Kniazev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Regina Khazigaleeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vassili Lazarev
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Daria Kharlampieva
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Ekaterina Grafskaia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Viktor Zgoda
- Laboratory of System Biology, Institute of Biomedical Chemistry, 119121 Moscow, Russian Federation
| | - Ivan Butenko
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Georgy Arapidi
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation.,Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Anna Mamaeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Ivanov
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Govorun
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| |
Collapse
|
210
|
Talkish J, Igel H, Perriman RJ, Shiue L, Katzman S, Munding EM, Shelansky R, Donohue JP, Ares M. Rapidly evolving protointrons in Saccharomyces genomes revealed by a hungry spliceosome. PLoS Genet 2019; 15:e1008249. [PMID: 31437148 DOI: 10.1101/515197] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Revised: 09/04/2019] [Accepted: 06/15/2019] [Indexed: 05/28/2023] Open
Abstract
Introns are a prevalent feature of eukaryotic genomes, yet their origins and contributions to genome function and evolution remain mysterious. In budding yeast, repression of the highly transcribed intron-containing ribosomal protein genes (RPGs) globally increases splicing of non-RPG transcripts through reduced competition for the spliceosome. We show that under these "hungry spliceosome" conditions, splicing occurs at more than 150 previously unannotated locations we call protointrons that do not overlap known introns. Protointrons use a less constrained set of splice sites and branchpoints than standard introns, including in one case AT-AC in place of GT-AG. Protointrons are not conserved in all closely related species, suggesting that most are not under positive selection and are fated to disappear. Some are found in non-coding RNAs (e. g. CUTs and SUTs), where they may contribute to the creation of new genes. Others are found across boundaries between noncoding and coding sequences, or within coding sequences, where they offer pathways to the creation of new protein variants, or new regulatory controls for existing genes. We define protointrons as (1) nonconserved intron-like sequences that are (2) infrequently spliced, and importantly (3) are not currently understood to contribute to gene expression or regulation in the way that standard introns function. A very few protointrons in S. cerevisiae challenge this classification by their increased splicing frequency and potential function, consistent with the proposed evolutionary process of "intronization", whereby new standard introns are created. This snapshot of intron evolution highlights the important role of the spliceosome in the expansion of transcribed genomic sequence space, providing a pathway for the rare events that may lead to the birth of new eukaryotic genes and the refinement of existing gene function.
Collapse
Affiliation(s)
- Jason Talkish
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Haller Igel
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Rhonda J Perriman
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Lily Shiue
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Sol Katzman
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Elizabeth M Munding
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Robert Shelansky
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - John Paul Donohue
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Manuel Ares
- Center for Molecular Biology of RNA, Department of Molecular, Cell & Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| |
Collapse
|
211
|
Pouvreau B, Fenske R, Ivanova A, Murcha MW, Mylne JS. An interstitial peptide is readily processed from within seed proteins. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2019; 285:175-183. [PMID: 31203882 DOI: 10.1016/j.plantsci.2019.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 04/25/2019] [Accepted: 05/02/2019] [Indexed: 06/09/2023]
Abstract
The importance of de novo protein evolution is apparent, but most examples are de novo coding transcripts evolving from silent or non-coding DNA. The peptide macrocycle SunFlower Trypsin Inhibitor 1 (SFTI-1) evolved over 45 million years from genetic expansion within the N-terminal 'discarded' region of an ancestral seed albumin precursor. SFTI-1 and its adjacent albumin are both processed into separate, mature forms by asparaginyl endopeptidase (AEP). Here to determine whether the evolution of SFTI-1 in a latent region of its precursor was critical, we used a transgene approach in A. thaliana analysed by peptide mass spectrometry and RT-qPCR. SFTI could emerge from alternative locations within preproalbumin as well as emerge with precision from unrelated seed proteins via AEP-processing. SFTI production was possible with the adjacent albumin, but peptide levels dropped greatly without the albumin. The ability for SFTI to be processed from multiple sequence contexts and different proteins suggests that to make peptide, it was not crucial for the genetic expansion that gave rise to SFTI and its family to be within a latent protein region. Interstitial peptides, evolving like SFTI within existing proteins, might be more widespread and as a mechanism, SFTI exemplifies a stable, new, functional peptide that did not need a new gene to evolve de novo.
Collapse
Affiliation(s)
- Benjamin Pouvreau
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Ricarda Fenske
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Aneta Ivanova
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Monika W Murcha
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Joshua S Mylne
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia.
| |
Collapse
|
212
|
Nielly-Thibault L, Landry CR. Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases. Genetics 2019; 212:1353-1366. [PMID: 31227545 PMCID: PMC6707459 DOI: 10.1534/genetics.119.302187] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/14/2019] [Indexed: 12/03/2022] Open
Abstract
Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.
Collapse
Affiliation(s)
- Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| |
Collapse
|
213
|
Horizontal Gene Transfer as an Indispensable Driver for Evolution of Neocallimastigomycota into a Distinct Gut-Dwelling Fungal Lineage. Appl Environ Microbiol 2019; 85:AEM.00988-19. [PMID: 31126947 DOI: 10.1128/aem.00988-19] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 05/19/2019] [Indexed: 01/01/2023] Open
Abstract
Survival and growth of the anaerobic gut fungi (AGF; Neocallimastigomycota) in the herbivorous gut necessitate the possession of multiple abilities absent in other fungal lineages. We hypothesized that horizontal gene transfer (HGT) was instrumental in forging the evolution of AGF into a phylogenetically distinct gut-dwelling fungal lineage. The patterns of HGT were evaluated in the transcriptomes of 27 AGF strains, 22 of which were isolated and sequenced in this study, and 4 AGF genomes broadly covering the breadth of AGF diversity. We identified 277 distinct incidents of HGT in AGF transcriptomes, with subsequent gene duplication resulting in an HGT frequency of 2 to 3.5% in AGF genomes. The majority of HGT events were AGF specific (91.7%) and wide (70.8%), indicating their occurrence at early stages of AGF evolution. The acquired genes allowed AGF to expand their substrate utilization range, provided new venues for electron disposal, augmented their biosynthetic capabilities, and facilitated their adaptation to anaerobiosis. The majority of donors were anaerobic fermentative bacteria prevalent in the herbivorous gut. This study strongly indicates that HGT indispensably forged the evolution of AGF as a distinct fungal phylum and provides a unique example of the role of HGT in shaping the evolution of a high-rank taxonomic eukaryotic lineage.IMPORTANCE The anaerobic gut fungi (AGF) represent a distinct basal phylum lineage (Neocallimastigomycota) commonly encountered in the rumen and alimentary tracts of herbivores. Survival and growth of anaerobic gut fungi in these anaerobic, eutrophic, and prokaryote-dominated habitats necessitates the acquisition of several traits absent in other fungal lineages. We assess here the role of horizontal gene transfer as a relatively fast mechanism for trait acquisition by the Neocallimastigomycota postsequestration in the herbivorous gut. Analysis of 27 transcriptomes that represent the broad diversity of Neocallimastigomycota identified 277 distinct HGT events, with subsequent gene duplication resulting in an HGT frequency of 2 to 3.5% in AGF genomes. These HGT events have allowed AGF to survive in the herbivorous gut by expanding their substrate utilization range, augmenting their biosynthetic pathway, providing new routes for electron disposal by expanding fermentative capacities, and facilitating their adaptation to anaerobiosis. HGT in the AGF is also shown to be mainly a cross-kingdom affair, with the majority of donors belonging to the bacteria. This study represents a unique example of the role of HGT in shaping the evolution of a high-rank taxonomic eukaryotic lineage.
Collapse
|
214
|
Prabh N, Rödelsperger C. De Novo, Divergence, and Mixed Origin Contribute to the Emergence of Orphan Genes in Pristionchus Nematodes. G3 (BETHESDA, MD.) 2019; 9:2277-2286. [PMID: 31088903 PMCID: PMC6643871 DOI: 10.1534/g3.119.400326] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 05/11/2019] [Indexed: 12/30/2022]
Abstract
Homology is a fundamental concept in comparative biology. It is extensively used at the sequence level to make phylogenetic hypotheses and functional inferences. Nonetheless, the majority of eukaryotic genomes contain large numbers of orphan genes lacking homologs in other taxa. Generally, the fraction of orphan genes is higher in genomically undersampled clades, and in the absence of closely related genomes any hypothesis about their origin and evolution remains untestable. Previously, we sequenced ten genomes with an underlying ladder-like phylogeny to establish a phylogenomic framework for studying genome evolution in diplogastrid nematodes. Here, we use this deeply sampled data set to understand the processes that generate orphan genes in our focal species Pristionchus pacificus Based on phylostratigraphic analysis and additional bioinformatic filters, we obtained 29 high-confidence candidate genes for which mechanisms of orphan origin were proposed based on manual inspection. This revealed diverse mechanisms including annotation artifacts, chimeric origin, alternative reading frame usage, and gene splitting with subsequent gain of de novo exons. In addition, we present two cases of complete de novo origination from non-coding regions, which represents one of the first reports of de novo genes in nematodes. Thus, we conclude that de novo emergence, divergence, and mixed mechanisms contribute to novel gene formation in Pristionchus nematodes.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, August Thienemann Str. 2, 24306 Plön, Germany
| | - Christian Rödelsperger
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| |
Collapse
|
215
|
Zhang JY, Zhou Q. On the Regulatory Evolution of New Genes Throughout Their Life History. Mol Biol Evol 2019; 36:15-27. [PMID: 30395322 DOI: 10.1093/molbev/msy206] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Every gene has a birthplace and an age, that is, a cis-regulatory environment and an evolution lifespan since its origination, yet how the two shape the evolution trajectories of genes remains unclear. Here, we address this basic question by comparing phylogenetically dated new genes in the context of both their ages and origination mechanisms. In both Drosophila and vertebrates, we confirm a clear "out of the testis" transition from the specifically expressed young genes to the broadly expressed old housekeeping genes, observed only in testis but not in other tissues. Many new genes have gained important functions during embryogenesis, manifested as either specific activation at maternal-zygotic transition, or different spatiotemporal expressions from their parental genes. These expression patterns are largely driven by an age-dependent evolution of cis-regulatory environment. We discover that retrogenes are more frequently born in a pre-existing repressive regulatory domain, and are more diverged in their enhancer repertoire than the DNA-based gene duplications. During evolution, new gene duplications gradually gain active histone modifications and undergo more enhancer turnovers when becoming older, but exhibit complex trends of gaining or losing repressive histone modifications in Drosophila or vertebrates, respectively. Interestingly, vertebrate new genes exhibit an "into the testis" epigenetic transition that older genes become more likely to be co-occupied by both active and repressive ("bivalent") histone modifications specifically in testis. Our results uncover the regulatory mechanisms underpinning the stepwise acquisition of novel and complex functions by new genes, and illuminate the general evolution trajectory of genes throughout their life history.
Collapse
Affiliation(s)
- Jia-Yu Zhang
- MOE Key Laboratory of Biosystems Homeostasis & Protection, Life Sciences Institute, Zhejiang University, Hangzhou, China
| | - Qi Zhou
- MOE Key Laboratory of Biosystems Homeostasis & Protection, Life Sciences Institute, Zhejiang University, Hangzhou, China.,Department of Molecular Evolution and Development, University of Vienna, Vienna, Austria
| |
Collapse
|
216
|
Kubicek CP, Steindorff AS, Chenthamara K, Manganiello G, Henrissat B, Zhang J, Cai F, Kopchinskiy AG, Kubicek EM, Kuo A, Baroncelli R, Sarrocco S, Noronha EF, Vannacci G, Shen Q, Grigoriev IV, Druzhinina IS. Evolution and comparative genomics of the most common Trichoderma species. BMC Genomics 2019; 20:485. [PMID: 31189469 PMCID: PMC6560777 DOI: 10.1186/s12864-019-5680-7] [Citation(s) in RCA: 127] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 04/09/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The growing importance of the ubiquitous fungal genus Trichoderma (Hypocreales, Ascomycota) requires understanding of its biology and evolution. Many Trichoderma species are used as biofertilizers and biofungicides and T. reesei is the model organism for industrial production of cellulolytic enzymes. In addition, some highly opportunistic species devastate mushroom farms and can become pathogens of humans. A comparative analysis of the first three whole genomes revealed mycoparasitism as the innate feature of Trichoderma. However, the evolution of these traits is not yet understood. RESULTS We selected 12 most commonly occurring Trichoderma species and studied the evolution of their genome sequences. Trichoderma evolved in the time of the Cretaceous-Palaeogene extinction event 66 (±15) mya, but the formation of extant sections (Longibrachiatum, Trichoderma) or clades (Harzianum/Virens) happened in Oligocene. The evolution of the Harzianum clade and section Trichoderma was accompanied by significant gene gain, but the ancestor of section Longibrachiatum experienced rapid gene loss. The highest number of genes gained encoded ankyrins, HET domain proteins and transcription factors. We also identified the Trichoderma core genome, completely curated its annotation, investigated several gene families in detail and compared the results to those of other fungi. Eighty percent of those genes for which a function could be predicted were also found in other fungi, but only 67% of those without a predictable function. CONCLUSIONS Our study presents a time scaled pattern of genome evolution in 12 Trichoderma species from three phylogenetically distant clades/sections and a comprehensive analysis of their genes. The data offer insights in the evolution of a mycoparasite towards a generalist.
Collapse
Affiliation(s)
- Christian P Kubicek
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria
- , Vienna, Austria
| | - Andrei S Steindorff
- Departamento de Biologia Celular, Universidade de Brasília, Brasíla, DF, Brazil
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Komal Chenthamara
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria
| | - Gelsomina Manganiello
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
- Dipartimento di Agraria, Università degli Studi di Napoli "Federico II", Naples, Portici, Italy
| | - Bernard Henrissat
- CNRS, Aix-Marseille Université, Marseille, France
- INRA, Marseille, France
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Jian Zhang
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China
| | - Feng Cai
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China
| | - Alexey G Kopchinskiy
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria
| | | | - Alan Kuo
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Riccardo Baroncelli
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Departamento de Microbiología y Genética, Universidad de Salamanca, Campus de Villamayor, Calle Del Duero, Villamayor, España
| | - Sabrina Sarrocco
- Department of Agriculture, Food and Environment, University of Pisa, Pisa, Italy
| | | | - Giovanni Vannacci
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Departamento de Microbiología y Genética, Universidad de Salamanca, Campus de Villamayor, Calle Del Duero, Villamayor, España
| | - Qirong Shen
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China.
| | - Igor V Grigoriev
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA.
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA.
| | - Irina S Druzhinina
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria.
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China.
| |
Collapse
|
217
|
Chekulaeva M, Rajewsky N. Roles of Long Noncoding RNAs and Circular RNAs in Translation. Cold Spring Harb Perspect Biol 2019; 11:cshperspect.a032680. [PMID: 30082465 DOI: 10.1101/cshperspect.a032680] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Most of the eukaryotic genome is pervasively transcribed, yielding hundreds to thousands of long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs), some of which are well conserved during evolution. Functions have been described for a few lncRNAs and circRNAs but remain elusive for most. Both classes of RNAs play regulatory roles in translation by interacting with messenger RNAs (mRNAs), microRNAs (miRNAs), or mRNA-binding proteins (RBPs), thereby modulating translation in trans Moreover, although initially defined as noncoding, a number of lncRNAs and circRNAs have recently been reported to contain functional open reading frames (ORFs). Here, we review current understanding of the roles played by lncRNAs and circRNAs in protein synthesis and discuss challenges and open questions in the field.
Collapse
Affiliation(s)
- Marina Chekulaeva
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| | - Nikolaus Rajewsky
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| |
Collapse
|
218
|
Abstract
Genetic, transcriptional, and post-transcriptional variations shape the transcriptome of individual cells, rendering establishing an exhaustive set of reference RNAs a complicated matter. Current reference transcriptomes, which are based on carefully curated transcripts, are lagging behind the extensive RNA variation revealed by massively parallel sequencing. Much may be missed by ignoring this unreferenced RNA diversity. There is plentiful evidence for non-reference transcripts with important phenotypic effects. Although reference transcriptomes are inestimable for gene expression analysis, they may turn limiting in important medical applications. We discuss computational strategies for retrieving hidden transcript diversity.
Collapse
Affiliation(s)
- Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, CNRS UMR 3244, Sorbonne Université, PSL University, Institut Curie, Centre de Recherche, 26 rue d'Ulm, 75248, Paris, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Sud, Université Paris Saclay, Gif sur Yvette, France.
| |
Collapse
|
219
|
Durand É, Gagnon-Arsenault I, Hallin J, Hatin I, Dubé AK, Nielly-Thibault L, Namy O, Landry CR. Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations. Genome Res 2019; 29:932-943. [PMID: 31152050 PMCID: PMC6581059 DOI: 10.1101/gr.239822.118] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 05/13/2019] [Indexed: 12/17/2022]
Abstract
Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of intergenic ORFs show translation signatures similar to canonical genes, and we experimentally confirmed the translation of many of these ORFs in laboratory conditions using a reporter assay. Compared with canonical genes, intergenic ORFs have lower translation efficiency, which could imply a lack of optimization for translation or a mechanism to reduce their production cost. Translated intergenic ORFs also tend to have sequence properties that are generally close to those of random intergenic sequences. However, some of the very recent translated intergenic ORFs, which appeared <110 kya, already show gene-like characteristics, suggesting that the raw material for functional innovations could appear over short evolutionary timescales.
Collapse
Affiliation(s)
- Éléonore Durand
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Johan Hallin
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Hatin
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Olivier Namy
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| |
Collapse
|
220
|
Wang R, Wang Y, Zhang X, Zhang Y, Du X, Fang Y, Li G. Hierarchical cooperation of transcription factors from integration analysis of DNA sequences, ChIP-Seq and ChIA-PET data. BMC Genomics 2019; 20:296. [PMID: 32039697 PMCID: PMC7226942 DOI: 10.1186/s12864-019-5535-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background Chromosomal architecture, which is constituted by chromatin loops, plays an important role in cellular functions. Gene expression and cell identity can be regulated by the chromatin loop, which is formed by proximal or distal enhancers and promoters in linear DNA (1D). Enhancers and promoters are fundamental non-coding elements enriched with transcription factors (TFs) to form chromatin loops. However, the specific cooperation of TFs involved in forming chromatin loops is not fully understood. Results Here, we proposed a method for investigating the cooperation of TFs in four cell lines by the integrative analysis of DNA sequences, ChIP-Seq and ChIA-PET data. Results demonstrate that the interaction of enhancers and promoters is a hierarchical and dynamic complex process with cooperative interactions of different TFs synergistically regulating gene expression and chromatin structure. The TF cooperation involved in maintaining and regulating the chromatin loop of cells can be regulated by epigenetic factors, such as other TFs and DNA methylation. Conclusions Such cooperation among TFs provides the potential features that can affect chromatin’s 3D architecture in cells. The regulation of chromatin 3D organization and gene expression is a complex process associated with the hierarchical and dynamic prosperities of TFs. Electronic supplementary material The online version of this article (10.1186/s12864-019-5535-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ruimin Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yunlong Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xueying Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yaliang Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xiaoyong Du
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China.,Huazhong Agricultural University, Wuhan, 430070, China
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
221
|
Xu YC, Niu XM, Li XX, He W, Chen JF, Zou YP, Wu Q, Zhang YE, Busch W, Guo YL. Adaptation and Phenotypic Diversification in Arabidopsis through Loss-of-Function Mutations in Protein-Coding Genes. THE PLANT CELL 2019; 31:1012-1025. [PMID: 30886128 PMCID: PMC6533021 DOI: 10.1105/tpc.18.00791] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 02/25/2019] [Accepted: 03/17/2019] [Indexed: 05/07/2023]
Abstract
According to the less-is-more hypothesis, gene loss is an engine for evolutionary change. Loss-of-function (LoF) mutations resulting in the natural knockout of protein-coding genes not only provide information about gene function but also play important roles in adaptation and phenotypic diversification. Although the less-is-more hypothesis was proposed two decades ago, it remains to be explored on a large scale. In this study, we identified 60,819 LoF variants in 1071 Arabidopsis (Arabidopsis thaliana) genomes and found that 34% of Arabidopsis protein-coding genes annotated in the Columbia-0 genome do not have any LoF variants. We found that nucleotide diversity, transposable element density, and gene family size are strongly correlated with the presence of LoF variants. Intriguingly, 0.9% of LoF variants with minor allele frequency larger than 0.5% are associated with climate change. In addition, in the Yangtze River basin population, 1% of genes with LoF mutations were under positive selection, providing important insights into the contribution of LoF mutations to adaptation. In particular, our results demonstrate that LoF mutations shape diverse phenotypic traits. Overall, our results highlight the importance of the LoF variants for the adaptation and phenotypic diversification of plants.
Collapse
Affiliation(s)
- Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiao-Min Niu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin-Xin Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenrong He
- Salk Institute for Biological Studies, Plant Molecular and Cellular Biology Laboratory, La Jolla, California 92037
| | - Jia-Fu Chen
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu-Pan Zou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiong Wu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Yong E Zhang
- University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents & Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Wolfgang Busch
- Salk Institute for Biological Studies, Plant Molecular and Cellular Biology Laboratory, La Jolla, California 92037
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
222
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
223
|
|
224
|
Mohd-Assaad N, McDonald BA, Croll D. The emergence of the multi-species NIP1 effector in Rhynchosporium was accompanied by high rates of gene duplications and losses. Environ Microbiol 2019; 21:2677-2695. [PMID: 30838748 DOI: 10.1111/1462-2920.14583] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 02/23/2019] [Accepted: 03/04/2019] [Indexed: 01/28/2023]
Abstract
Plant pathogens secrete effector proteins to manipulate the host and facilitate infection. Cognate hosts trigger strong defence responses upon detection of these effectors. Consequently, pathogens and hosts undergo rapid coevolutionary arms races driven by adaptive evolution of effectors and receptors. Because of their high rate of turnover, most effectors are thought to be species-specific and the evolutionary trajectories are poorly understood. Here, we investigate the necrosis-inducing protein 1 (NIP1) effector in the multihost pathogen genus Rhynchosporium. We retraced the evolutionary history of the NIP1 locus using whole-genome assemblies of 146 strains covering four closely related species. NIP1 orthologues were present in all species but the locus consistently segregated presence-absence polymorphisms suggesting long-term balancing selection. We also identified previously unknown paralogues of NIP1 that were shared among multiple species and showed substantial copy-number variation within R. commune. The NIP1A paralogue was under significant positive selection suggesting that NIP1A is the dominant effector variant coevolving with host immune receptors. Consistent with this prediction, we found that copy number variation at NIP1A had a stronger effect on virulence than NIP1B. Our analyses unravelled the origins and diversification mechanisms of a pathogen effector family shedding light on how pathogens gain adaptive genetic variation.
Collapse
Affiliation(s)
- Norfarhan Mohd-Assaad
- Plant Pathology, Institute of Integrative Biology, ETH, Zurich, 8092 Zurich, Switzerland.,School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Bruce A McDonald
- Plant Pathology, Institute of Integrative Biology, ETH, Zurich, 8092 Zurich, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland
| |
Collapse
|
225
|
Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol 2019; 3:679-690. [PMID: 30858588 DOI: 10.1038/s41559-019-0822-5] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 01/23/2019] [Indexed: 12/22/2022]
Abstract
New protein-coding genes that arise de novo from non-coding DNA sequences contribute to protein diversity. However, de novo gene origination is challenging to study as it requires high-quality reference genomes for closely related species, evidence for ancestral non-coding sequences, and transcription and translation of the new genes. High-quality genomes of 13 closely related Oryza species provide unprecedented opportunities to understand de novo origination events. Here, we identify a large number of young de novo genes with discernible recent ancestral non-coding sequences and evidence of translation. Using pipelines examining the synteny relationship between genomes and reciprocal-best whole-genome alignments, we detected at least 175 de novo open reading frames in the focal species O. sativa subspecies japonica, which were all detected in RNA sequencing-based transcriptomes. Mass spectrometry-based targeted proteomics and ribosomal profiling show translational evidence for 57% of the de novo genes. In recent divergence of Oryza, an average of 51.5 de novo genes per million years were generated and retained. We observed evolutionary patterns in which excess indels and early transcription were favoured in origination with a stepwise formation of gene structure. These data reveal that de novo genes contribute to the rapid evolution of protein diversity under positive selection.
Collapse
|
226
|
Khitun A, Ness TJ, Slavoff SA. Small open reading frames and cellular stress responses. Mol Omics 2019; 15:108-116. [PMID: 30810554 DOI: 10.1039/c8mo00283e] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Small open reading frames (smORFs) encoding polypeptides of less than 100 amino acids in eukaryotes (50 amino acids in prokaryotes) were historically excluded from genome annotation. However, recent advances in genomics, ribosome footprinting, and proteomics have revealed thousands of translated smORFs in genomes spanning evolutionary space. These smORFs can encode functional polypeptides, or act as cis-translational regulators. Herein we review evidence that some smORF-encoded polypeptides (SEPs) participate in stress responses in both prokaryotes and eukaryotes, and that some upstream ORFs (uORFs) regulate stress-responsive translation of downstream cistrons in eukaryotic cells. These studies provide insight into a regulated subclass of smORFs and suggest that at least some SEPs may participate in maintenance of cellular homeostasis under stress.
Collapse
Affiliation(s)
- Alexandra Khitun
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Travis J Ness
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Sarah A Slavoff
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA and Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
227
|
Molecular mechanism and history of non-sense to sense evolution of antifreeze glycoprotein gene in northern gadids. Proc Natl Acad Sci U S A 2019; 116:4400-4405. [PMID: 30765531 PMCID: PMC6410882 DOI: 10.1073/pnas.1817138116] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The diverse antifreeze proteins enabling the survival of different polar fishes in freezing seas offer unparalleled vistas into the breadth of genetic sources and mechanisms that produce crucial new functions. Although most new genes evolved from preexisting genic ancestors, some are deemed to have arisen from noncoding DNA. However, the pertinent mechanisms, functions, and selective forces remain uncertain. Our paper presents clear evidence that the antifreeze glycoprotein gene of the northern codfish originated from a noncoding region. We further describe the detailed mechanism of its evolutionary transformation into a full-fledged crucial life-saving gene. This paper is a concrete dissection of the process of a de novo gene birth that has conferred a vital adaptive function directly linked to natural selection. A fundamental question in evolutionary biology is how genetic novelty arises. De novo gene birth is a recently recognized mechanism, but the evolutionary process and function of putative de novo genes remain largely obscure. With a clear life-saving function, the diverse antifreeze proteins of polar fishes are exemplary adaptive innovations and models for investigating new gene evolution. Here, we report clear evidence and a detailed molecular mechanism for the de novo formation of the northern gadid (codfish) antifreeze glycoprotein (AFGP) gene from a minimal noncoding sequence. We constructed genomic DNA libraries for AFGP-bearing and AFGP-lacking species across the gadid phylogeny and performed fine-scale comparative analyses of the AFGP genomic loci and homologs. We identified the noncoding founder region and a nine-nucleotide (9-nt) element therein that supplied the codons for one Thr-Ala-Ala unit from which the extant repetitive AFGP-coding sequence (cds) arose through tandem duplications. The latent signal peptide (SP)-coding exons were fortuitous noncoding DNA sequence immediately upstream of the 9-nt element, which, when spliced, supplied a typical secretory signal. Through a 1-nt frameshift mutation, these two parts formed a single read-through open reading frame (ORF). It became functionalized when a putative translocation event conferred the essential cis promoter for transcriptional initiation. We experimentally proved that all genic components of the extant gadid AFGP originated from entirely nongenic DNA. The gadid AFGP evolutionary process also represents a rare example of the proto-ORF model of de novo gene birth where a fully formed ORF existed before the regulatory element to activate transcription was acquired.
Collapse
|
228
|
Wang Y, Zeng Z, Liu TL, Sun L, Yao Q, Chen KP. TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes. Mol Genet Genomics 2019; 294:637-647. [PMID: 30758669 DOI: 10.1007/s00438-019-01535-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 01/31/2019] [Indexed: 01/01/2023]
Abstract
Genomes can be considered a combination of 16 dinucleotides. Analysing the relative abundance of different dinucleotides may reveal important features of genome evolution. In present study, we conducted extensive surveys on the relative abundances of dinucleotides in various genomic components of 28 bacterial, 20 archaean, 19 fungal, 24 plant and 29 animal species. We found that TA, GT and AC are significantly under-represented in open reading frames of all organisms and in intergenic regions and introns of most organisms. Specific dinucleotides are of greatly varied usage at different codon positions. The significantly low representations of TA, GT and AC are considered the evolutionary consequences of preventing formation of pre-mature stop codons and of reducing intron-splicing options in candidate primary mRNA sequences. These data suggest that a reduction of TA and GT occurred on both strands of the DNA sequence at an early stage of de novo gene birth. Interestingly, GT and AC are also significantly under-represented in current prokaryotic genomes, suggesting that ancient prokaryotic protein-coding genes might have contained introns. The greatly varied usages of specific dinucleotides at different codon positions are considered evolutionary accommodations to compensate the unavailability of specific codons and to avoid formation of pre-mature stop codons. This is the first report presenting data of dinucleotide relative abundance to indicate the possible existence of spliceosomal introns in ancient prokaryotic genes and to hypothesize early steps of de novo gene birth.
Collapse
Affiliation(s)
- Yong Wang
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China.
| | - Zhen Zeng
- Institute of Life Sciences, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Tian-Lei Liu
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Ling Sun
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Qin Yao
- Institute of Life Sciences, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Ke-Ping Chen
- Institute of Life Sciences, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| |
Collapse
|
229
|
Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I. A Molecular Portrait of De Novo Genes in Yeasts. Mol Biol Evol 2019; 35:631-645. [PMID: 29220506 DOI: 10.1093/molbev/msx315] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Alex S Hebert
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | - Dana A Opulente
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Guillaume Achaz
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,SMILE Group, CIRB UMR7241, Collège de France, Paris, France
| | - Chris Todd Hittinger
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Gilles Fischer
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Joshua J Coon
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI.,Department of Chemistry, University of Wisconsin-Madison, Madison, WI.,Morgridge Institute for Research, Madison, WI
| | - Ingrid Lafontaine
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Physico-Chimique, Physiologie Membranaire et Moléculaire du Chloroplaste UMR7141, 75005 Paris, France
| |
Collapse
|
230
|
Claverie JM, Abergel C, Legendre M. [Giant viruses that create their own genes]. Med Sci (Paris) 2019; 34:1087-1091. [PMID: 30623766 DOI: 10.1051/medsci/2018300] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Since 2003 and the discovery of Mimivirus, the saga of giant viruses continues with the isolation of new amoeba viruses, which are now divided into seven distinct families, the origin (s) of which are still mysterious and controversial. Thanks to the isolation of 3 new members of the Pandoraviridae family, whose micrometric particles and genomes of more than 2 megabases encroach on the cellular world, we carried out a stringent re-analysis of their gene contents, using a combination of transcriptomic, proteomic and bioinformatic approaches. We concluded that the only scenario capable of accounting for the distribution and the huge proportion of orphan genes ("ORFans") that characterize Pandoraviruses is that they were created de novo within the intergenic regions. This process, perhaps shared among other large DNA viruses, challenges the central paradigm of molecular evolution according to which all genes / proteins have an ancestry history.
Collapse
Affiliation(s)
- Jean-Michel Claverie
- Aix-Marseille université et CNRS, Information génomique et structurale (IGS), UMR7256, Institut de microbiologie de la Méditerranée-IMM-FR 3479, parc scientifique de Luminy, 163, avenue de Luminy, case 934, 13288 Marseille Cedex 09, France
| | - Chantal Abergel
- Aix-Marseille université et CNRS, Information génomique et structurale (IGS), UMR7256, Institut de microbiologie de la Méditerranée-IMM-FR 3479, parc scientifique de Luminy, 163, avenue de Luminy, case 934, 13288 Marseille Cedex 09, France
| | - Matthieu Legendre
- Aix-Marseille université et CNRS, Information génomique et structurale (IGS), UMR7256, Institut de microbiologie de la Méditerranée-IMM-FR 3479, parc scientifique de Luminy, 163, avenue de Luminy, case 934, 13288 Marseille Cedex 09, France
| |
Collapse
|
231
|
McGowan J, Byrne KP, Fitzpatrick DA. Comparative Analysis of Oomycete Genome Evolution Using the Oomycete Gene Order Browser (OGOB). Genome Biol Evol 2019; 11:189-206. [PMID: 30535146 PMCID: PMC6330052 DOI: 10.1093/gbe/evy267] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2018] [Indexed: 01/01/2023] Open
Abstract
The oomycetes are a class of microscopic, filamentous eukaryotes within the stramenopiles–alveolates–rhizaria eukaryotic supergroup. They include some of the most destructive pathogens of animals and plants, such as Phytophthora infestans, the causative agent of late potato blight. Despite the threat they pose to worldwide food security and natural ecosystems, there is a lack of tools and databases available to study oomycete genetics and evolution. To this end, we have developed the Oomycete Gene Order Browser (OGOB), a curated database that facilitates comparative genomic and syntenic analyses of oomycete species. OGOB incorporates genomic data for 20 oomycete species including functional annotations and a number of bioinformatics tools. OGOB hosts a robust set of orthologous oomycete genes for evolutionary analyses. Here, we present the structure and function of OGOB as well as a number of comparative genomic analyses we have performed to better understand oomycete genome evolution. We analyze the extent of oomycete gene duplication and identify tandem gene duplication as a driving force of the expansion of secreted oomycete genes. We identify core genes that are present and microsyntenically conserved (termed syntenologs) in oomycete lineages and identify the degree of microsynteny between each pair of the 20 species housed in OGOB. Consistent with previous comparative synteny analyses between a small number of oomycete species, our results reveal an extensive degree of microsyntenic conservation amongst genes with housekeeping functions within the oomycetes. OGOB is available at https://ogob.ie.
Collapse
Affiliation(s)
- Jamie McGowan
- Genome Evolution Laboratory, Department of Biology, Maynooth University, Co. Kildare, Ireland.,Human Health Research Institute, Maynooth University, Co. Kildare, Ireland
| | - Kevin P Byrne
- School of Medicine, UCD Conway Institute, University College Dublin, Ireland
| | - David A Fitzpatrick
- Genome Evolution Laboratory, Department of Biology, Maynooth University, Co. Kildare, Ireland.,Human Health Research Institute, Maynooth University, Co. Kildare, Ireland
| |
Collapse
|
232
|
Qi M, Zheng W, Zhao X, Hohenstein JD, Kandel Y, O'Conner S, Wang Y, Du C, Nettleton D, MacIntosh GC, Tylka GL, Wurtele ES, Whitham SA, Li L. QQS orphan gene and its interactor NF-YC4 reduce susceptibility to pathogens and pests. PLANT BIOTECHNOLOGY JOURNAL 2019; 17:252-263. [PMID: 29878511 PMCID: PMC6330549 DOI: 10.1111/pbi.12961] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 06/04/2018] [Indexed: 05/19/2023]
Abstract
Enhancing the nutritional quality and disease resistance of crops without sacrificing productivity is a key issue for developing varieties that are valuable to farmers and for simultaneously improving food security and sustainability. Expression of the Arabidopsis thaliana species-specific AtQQS (Qua-Quine Starch) orphan gene or its interactor, NF-YC4 (Nuclear Factor Y, subunit C4), has been shown to increase levels of leaf/seed protein without affecting the growth and yield of agronomic species. Here, we demonstrate that overexpression of AtQQS and NF-YC4 in Arabidopsis and soybean enhances resistance/reduces susceptibility to viruses, bacteria, fungi, aphids and soybean cyst nematodes. A series of Arabidopsis mutants in starch metabolism were used to explore the relationships between QQS expression, carbon and nitrogen partitioning, and defense. The enhanced basal defenses mediated by QQS were independent of changes in protein/carbohydrate composition of the plants. We demonstrate that either AtQQS or NF-YC4 overexpression in Arabidopsis and in soybean reduces susceptibility of these plants to pathogens/pests. Transgenic soybean lines overexpressing NF-YC4 produce seeds with increased protein while maintaining healthy growth. Pull-down studies reveal that QQS interacts with human NF-YC, as well as with Arabidopsis NF-YC4, and indicate two QQS binding sites near the NF-YC-histone-binding domain. A new model for QQS interaction with NF-YC is speculated. Our findings illustrate the potential of QQS and NF-YC4 to increase protein and improve defensive traits in crops, overcoming the normal growth-defense trade-offs.
Collapse
Affiliation(s)
- Mingsheng Qi
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Wenguang Zheng
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
| | - Xuefeng Zhao
- Laurence H. Baker Center for Bioinformatics and Biological StatisticsIowa State UniversityAmesIAUSA
| | - Jessica D. Hohenstein
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular BiologyIowa State UniversityAmesIAUSA
| | - Yuba Kandel
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Seth O'Conner
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
- Department of Biological SciencesMississippi State UniversityStarkvilleMSUSA
| | - Yifan Wang
- Department of StatisticsIowa State UniversityAmesIAUSA
| | - Chuanlong Du
- Department of StatisticsIowa State UniversityAmesIAUSA
| | - Dan Nettleton
- Department of StatisticsIowa State UniversityAmesIAUSA
| | - Gustavo C. MacIntosh
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular BiologyIowa State UniversityAmesIAUSA
| | - Gregory L. Tylka
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Eve S. Wurtele
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
- Center for Metabolic BiologyIowa State UniversityAmesIAUSA
| | - Steven A. Whitham
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Ling Li
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
- Department of Biological SciencesMississippi State UniversityStarkvilleMSUSA
- Center for Metabolic BiologyIowa State UniversityAmesIAUSA
| |
Collapse
|
233
|
Abstract
De novo genes, that is, protein-coding genes originating from previously noncoding sequence, have gone from being considered impossibly unlikely to being recognized as an important source of genetic novelty in eukaryotic genomes. It is clear that de novo gene evolution is a rare but consistent feature of eukaryotic genomes, being detected in every genome studied. However, different studies often use different computational methods, and the numbers and identities of the detected genes vary greatly. Here we present a coherent protocol for the computational identification of de novo genes by comparative genomics. The method described uses homology searches, identification of syntenic regions, and ancestral sequence reconstruction to produce high-confidence candidates with robust evidence of de novo emergence. It is designed to be easily applicable given the basic knowledge of bioinformatic tools and scalable so that it can be applied on large and small datasets.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Department of Genetics, Trinity College Dublin, Smurfit Institute of Genetics, University of Dublin, Dublin, Ireland.
| | - Aoife McLysaght
- Department of Genetics, Trinity College Dublin, Smurfit Institute of Genetics, University of Dublin, Dublin, Ireland
| |
Collapse
|
234
|
Translation of Small Open Reading Frames: Roles in Regulation and Evolutionary Innovation. Trends Genet 2018; 35:186-198. [PMID: 30606460 DOI: 10.1016/j.tig.2018.12.003] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023]
Abstract
The translatome can be defined as the sum of the RNA sequences that are translated into proteins in the cell by the ribosomal machinery. Until recently, it was generally assumed that the translatome was essentially restricted to evolutionary conserved proteins encoded by the set of annotated protein-coding genes. However, it has become increasingly clear that it also includes small regulatory open reading frames (ORFs), functional micropeptides, de novo proteins, and the pervasive translation of likely nonfunctional proteins. Many of these ORFs have been discovered thanks to the development of ribosome profiling, a technique to sequence ribosome-protected RNA fragments. To fully capture the diversity of translated ORFs, we propose a comprehensive classification that includes the new types of translated ORFs in addition to standard proteins.
Collapse
|
235
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
236
|
Wirthlin M, Lima NCB, Guedes RLM, Soares AER, Almeida LGP, Cavaleiro NP, Loss de Morais G, Chaves AV, Howard JT, Teixeira MDM, Schneider PN, Santos FR, Schatz MC, Felipe MS, Miyaki CY, Aleixo A, Schneider MPC, Jarvis ED, Vasconcelos ATR, Prosdocimi F, Mello CV. Parrot Genomes and the Evolution of Heightened Longevity and Cognition. Curr Biol 2018; 28:4001-4008.e7. [PMID: 30528582 DOI: 10.1016/j.cub.2018.10.050] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Revised: 08/14/2018] [Accepted: 10/23/2018] [Indexed: 10/27/2022]
Abstract
Parrots are one of the most distinct and intriguing groups of birds, with highly expanded brains [1], highly developed cognitive [2] and vocal communication [3] skills, and a long lifespan compared to other similar-sized birds [4]. Yet the genetic basis of these traits remains largely unidentified. To address this question, we have generated a high-coverage, annotated assembly of the genome of the blue-fronted Amazon (Amazona aestiva) and carried out extensive comparative analyses with 30 other avian species, including 4 additional parrots. We identified several genomic features unique to parrots, including parrot-specific novel genes and parrot-specific modifications to coding and regulatory sequences of existing genes. We also discovered genomic features under strong selection in parrots and other long-lived birds, including genes previously associated with lifespan determination as well as several hundred new candidate genes. These genes support a range of cellular functions, including telomerase activity; DNA damage repair; control of cell proliferation, cancer, and immunity; and anti-oxidative mechanisms. We also identified brain-expressed, parrot-specific paralogs with known functions in neural development or vocal-learning brain circuits. Intriguingly, parrot-specific changes in conserved regulatory sequences were overwhelmingly associated with genes that are linked to cognitive abilities and have undergone similar selection in the human lineage, suggesting convergent evolution. These findings bring novel insights into the genetics and evolution of longevity and cognition, as well as provide novel targets for exploring the mechanistic basis of these traits.
Collapse
Affiliation(s)
- Morgan Wirthlin
- Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR 97239, USA
| | - Nicholas C B Lima
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ 21941-902, Brazil
| | - Rafael Lucas Muniz Guedes
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - André E R Soares
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Luiz Gonzaga P Almeida
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Nathalia P Cavaleiro
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Guilherme Loss de Morais
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Anderson V Chaves
- Programa de Pós-graduação em Manejo e Conservação de Ecossistemas Naturais e Agrários, Instituto de Ciências Biológicas e da Saúde, Universidade Federal de Viçosa, Florestal, Minas Gerais, Brazil
| | - Jason T Howard
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY 10065, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Marcus de Melo Teixeira
- Núcleo de Medicina Tropical, Faculdade de Medicina, Universidade de Brasília, Brasília, DF 70910-900, Brazil
| | - Patricia N Schneider
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, PA, Brazil
| | - Fabrício R Santos
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Maria Sueli Felipe
- Programa de Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília e Depto. de Biologia Celular, Universidade de Brasilia, Brasilia, DF, Brazil
| | - Cristina Y Miyaki
- Instituto de Biociências, Universidade de São Paulo, R. do Matão, 277, São Paulo, SP 05508-090, Brazil
| | - Alexandre Aleixo
- Coordenação de Zoologia, Museu Paraense Emilio Goeldi, Belém, PA 66040-170, Brazil
| | - Maria P C Schneider
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, PA, Brazil
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY 10065, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Ana Tereza R Vasconcelos
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Francisco Prosdocimi
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ 21941-902, Brazil.
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR 97239, USA.
| |
Collapse
|
237
|
Casola C. From De Novo to "De Nono": The Majority of Novel Protein-Coding Genes Identified with Phylostratigraphy Are Old Genes or Recent Duplicates. Genome Biol Evol 2018; 10:2906-2918. [PMID: 30346517 PMCID: PMC6239577 DOI: 10.1093/gbe/evy231] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 12/11/2022] Open
Abstract
The evolution of novel protein-coding genes from noncoding regions of the genome is one of the most compelling pieces of evidence for genetic innovations in nature. One popular approach to identify de novo genes is phylostratigraphy, which consists of determining the approximate time of origin (age) of a gene based on its distribution along a species phylogeny. Several studies have revealed significant flaws in determining the age of genes, including de novo genes, using phylostratigraphy alone. However, the rate of false positives in de novo gene surveys, based on phylostratigraphy, remains unknown. Here, I reanalyze the findings from three studies, two of which identified tens to hundreds of rodent-specific de novo genes adopting a phylostratigraphy-centered approach. Most putative de novo genes discovered in these investigations are no longer included in recently updated mouse gene sets. Using a combination of synteny information and sequence similarity searches, I show that ∼60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with nonrodent mammals. These results led to an estimated rate of ∼12 de novo genes per million years in mouse. Contrary to a previous study (Wilson BA, Foy SG, Neme R, Masel J. 2017. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 1:0146), I found no evidence supporting the preadaptation hypothesis of de novo gene formation. Nearly half of the de novo genes confirmed in this study are within older genes, indicating that co-option of preexisting regulatory regions and a higher GC content may facilitate the origin of novel genes.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University
| |
Collapse
|
238
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
239
|
Silar P, Dauget JM, Gautier V, Grognet P, Chablat M, Hermann-Le Denmat S, Couloux A, Wincker P, Debuchy R. A gene graveyard in the genome of the fungus Podospora comata. Mol Genet Genomics 2018; 294:177-190. [PMID: 30288581 DOI: 10.1007/s00438-018-1497-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 09/28/2018] [Indexed: 02/07/2023]
Abstract
Mechanisms involved in fine adaptation of fungi to their environment include differential gene regulation associated with single nucleotide polymorphisms and indels (including transposons), horizontal gene transfer, gene copy amplification, as well as pseudogenization and gene loss. The two Podospora genome sequences examined here emphasize the role of pseudogenization and gene loss, which have rarely been documented in fungi. Podospora comata is a species closely related to Podospora anserina, a fungus used as model in several laboratories. Comparison of the genome of P. comata with that of P. anserina, whose genome is available for over 10 years, should yield interesting data related to the modalities of genome evolution between these two closely related fungal species that thrive in the same types of biotopes, i.e., herbivore dung. Here, we present the genome sequence of the mat + isolate of the P. comata reference strain T. Comparison with the genome of the mat + isolate of P. anserina strain S confirms that P. anserina and P. comata are likely two different species that rarely interbreed in nature. Despite having a 94-99% of nucleotide identity in the syntenic regions of their genomes, the two species differ by nearly 10% of their gene contents. Comparison of the species-specific gene sets uncovered genes that could be responsible for the known physiological differences between the two species. Finally, we identified 428 and 811 pseudogenes (3.8 and 7.2% of the genes) in P. anserina and P. comata, respectively. Presence of high numbers of pseudogenes supports the notion that difference in gene contents is due to gene loss rather than horizontal gene transfers. We propose that the high frequency of pseudogenization leading to gene loss in P. anserina and P. comata accompanies specialization of these two fungi. Gene loss may be more prevalent during the evolution of other fungi than usually thought.
Collapse
Affiliation(s)
- Philippe Silar
- Univ Paris Diderot, Sorbonne Paris Cité, Laboratoire Interdisciplinaire des Energies de Demain, 75205, Paris Cedex 13, France.
| | - Jean-Marc Dauget
- Univ Paris Diderot, Sorbonne Paris Cité, Laboratoire Interdisciplinaire des Energies de Demain, 75205, Paris Cedex 13, France
| | - Valérie Gautier
- Univ Paris Diderot, Sorbonne Paris Cité, Laboratoire Interdisciplinaire des Energies de Demain, 75205, Paris Cedex 13, France
| | - Pierre Grognet
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| | - Michelle Chablat
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| | - Sylvie Hermann-Le Denmat
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France.,Ecole Normale Supérieure, 75005, Paris, France
| | - Arnaud Couloux
- CEA, Genoscope, Institut de biologie François Jacob, CP 5706, Evry, France
| | - Patrick Wincker
- CEA, Genoscope, Institut de biologie François Jacob, CP 5706, Evry, France.,CNRS UMR 8030, Evry, France.,Univ. Evry, Université Paris-Saclay, Evry, France
| | - Robert Debuchy
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France.
| |
Collapse
|
240
|
Hellen CUT. Translation Termination and Ribosome Recycling in Eukaryotes. Cold Spring Harb Perspect Biol 2018; 10:cshperspect.a032656. [PMID: 29735640 DOI: 10.1101/cshperspect.a032656] [Citation(s) in RCA: 129] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Termination of mRNA translation occurs when a stop codon enters the A site of the ribosome, and in eukaryotes is mediated by release factors eRF1 and eRF3, which form a ternary eRF1/eRF3-guanosine triphosphate (GTP) complex. eRF1 recognizes the stop codon, and after hydrolysis of GTP by eRF3, mediates release of the nascent peptide. The post-termination complex is then disassembled, enabling its constituents to participate in further rounds of translation. Ribosome recycling involves splitting of the 80S ribosome by the ATP-binding cassette protein ABCE1 to release the 60S subunit. Subsequent dissociation of deacylated transfer RNA (tRNA) and messenger RNA (mRNA) from the 40S subunit may be mediated by initiation factors (priming the 40S subunit for initiation), by ligatin (eIF2D) or by density-regulated protein (DENR) and multiple copies in T-cell lymphoma-1 (MCT1). These events may be subverted by suppression of termination (yielding carboxy-terminally extended read-through polypeptides) or by interruption of recycling, leading to reinitiation of translation near the stop codon.
Collapse
Affiliation(s)
- Christopher U T Hellen
- Department of Cell Biology, State University of New York, Downstate Medical Center, New York, New York 11203
| |
Collapse
|
241
|
Werner MS, Sieriebriennikov B, Prabh N, Loschko T, Lanz C, Sommer RJ. Young genes have distinct gene structure, epigenetic profiles, and transcriptional regulation. Genome Res 2018; 28:1675-1687. [PMID: 30232198 PMCID: PMC6211652 DOI: 10.1101/gr.234872.118] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 09/05/2018] [Indexed: 12/22/2022]
Abstract
Species-specific, new, or "orphan" genes account for 10%-30% of eukaryotic genomes. Although initially considered to have limited function, an increasing number of orphan genes have been shown to provide important phenotypic innovation. How new genes acquire regulatory sequences for proper temporal and spatial expression is unknown. Orphan gene regulation may rely in part on origination in open chromatin adjacent to preexisting promoters, although this has not yet been assessed by genome-wide analysis of chromatin states. Here, we combine taxon-rich nematode phylogenies with Iso-Seq, RNA-seq, ChIP-seq, and ATAC-seq to identify the gene structure and epigenetic signature of orphan genes in the satellite model nematode Pristionchus pacificus Consistent with previous findings, we find young genes are shorter, contain fewer exons, and are on average less strongly expressed than older genes. However, the subset of orphan genes that are expressed exhibit distinct chromatin states from similarly expressed conserved genes. Orphan gene transcription is determined by a lack of repressive histone modifications, confirming long-held hypotheses that open chromatin is important for new gene formation. Yet orphan gene start sites more closely resemble enhancers defined by H3K4me1, H3K27ac, and ATAC-seq peaks, in contrast to conserved genes that exhibit traditional promoters defined by H3K4me3 and H3K27ac. Although the majority of orphan genes are located on chromosome arms that contain high recombination rates and repressive histone marks, strongly expressed orphan genes are more randomly distributed. Our results support a model of new gene origination by rare integration into open chromatin near enhancers.
Collapse
Affiliation(s)
- Michael S Werner
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Bogdan Sieriebriennikov
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Neel Prabh
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Tobias Loschko
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Christa Lanz
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Ralf J Sommer
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| |
Collapse
|
242
|
Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol 2018; 2:1626-1632. [DOI: 10.1038/s41559-018-0639-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 07/09/2018] [Indexed: 11/08/2022]
|
243
|
Willis S, Masel J. Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes. Genetics 2018; 210:303-313. [PMID: 30026186 PMCID: PMC6116962 DOI: 10.1534/genetics.118.301249] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 07/18/2018] [Indexed: 11/18/2022] Open
Abstract
The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains [Formula: see text] or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
Collapse
Affiliation(s)
- Sara Willis
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
244
|
Bekpen C, Xie C, Tautz D. Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences. BMC Evol Biol 2018; 18:121. [PMID: 30075701 PMCID: PMC6091031 DOI: 10.1186/s12862-018-1232-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 07/16/2018] [Indexed: 12/26/2022] Open
Abstract
Background The adaptive immune system of vertebrates has an extraordinary potential to sense and neutralize foreign antigens entering the body. De novo evolution of genes implies that the genome itself expresses novel antigens from intergenic sequences which could cause a problem with this immune system. Peptides from these novel proteins could be presented by the major histocompatibility complex (MHC) receptors to the cell surface and would be recognized as foreign. The respective cells would then be attacked and destroyed, or would cause inflammatory responses. Hence, de novo expressed peptides have to be introduced to the immune system as being self-peptides to avoid such autoimmune reactions. The regulation of the distinction between self and non-self starts during embryonic development, but continues late into adulthood. It is mostly mediated by specialized cells in the thymus, but can also be conveyed in peripheral tissues, such as the lymph nodes and the spleen. The self-antigens need to be exposed to the reactive T-cells, which requires the expression of the genes in the respective tissues. Since the initial activation of a promotor for new intergenic transcription of a de novo gene could occur in any tissue, we should expect that the evolutionary establishment of a de novo gene in animals with an adaptive immune system should also involve expression in at least one of the tissues that confer self-recognition. Results We have studied this question by analyzing the transcriptomes of multiple tissues from young mice in three closely related natural populations of the house mouse (M. m. domesticus). We find that new intergenic transcription occurs indeed mostly in only a single tissue. When a second tissue becomes involved, thymus and spleen are significantly overrepresented. Conclusions We conclude that the inclusion of de novo transcripts in the processes for the induction of self-tolerance is indeed an important step in the evolution of functional de novo genes in vertebrates. Electronic supplementary material The online version of this article (10.1186/s12862-018-1232-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cemalettin Bekpen
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Chen Xie
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany.
| |
Collapse
|
245
|
Liao Y, Zhang X, Li B, Liu T, Chen J, Bai Z, Wang M, Shi J, Walling JG, Wing RA, Jiang J, Chen M. Comparison of Oryza sativa and Oryza brachyantha Genomes Reveals Selection-Driven Gene Escape from the Centromeric Regions. THE PLANT CELL 2018; 30:1729-1744. [PMID: 29967288 PMCID: PMC6139686 DOI: 10.1105/tpc.18.00163] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 05/23/2018] [Accepted: 06/28/2018] [Indexed: 05/03/2023]
Abstract
Centromeres are dynamic chromosomal regions, and the genetic and epigenetic environment of the centromere is often regarded as oppressive to protein-coding genes. Here, we used comparative genomic and phylogenomic approaches to study the evolution of centromeres and centromere-linked genes in the genus Oryza We report a 12.4-Mb high-quality BAC-based pericentromeric assembly for Oryza brachyantha, which diverged from cultivated rice (Oryza sativa) ∼15 million years ago. The synteny analyses reveal seven medium (>50 kb) pericentric inversions in O. sativa and 10 in O. brachyantha Of these inversions, three resulted in centromere movement (Chr1, Chr7, and Chr9). Additionally, we identified a potential centromere-repositioning event, in which the ancestral centromere on chromosome 12 in O. brachyantha jumped ∼400 kb away, possibly mediated by a duplicated transposition event (>28 kb). More strikingly, we observed an excess of syntenic gene loss at and near the centromeric regions (P < 2.2 × 10-16). Most (33/47) of the missing genes moved to other genomic regions; therefore such excess could be explained by the selective loss of the copy in or near centromeric regions after gene duplication. The pattern of gene loss immediately adjacent to centromeric regions suggests centromere chromatin dynamics (e.g., spreading or microrepositioning) may drive such gene loss.
Collapse
Affiliation(s)
- Yi Liao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuemei Zhang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Bo Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Tieyan Liu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jinfeng Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Zetao Bai
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Meijiao Wang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinfeng Shi
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jason G Walling
- USDA-ARS-MWA-Cereal Crops Research Unit, Madison, Wisconsin 53726
| | - Rod A Wing
- Arizona Genomics Institute, School of Plant Sciences, BIO5 Institute, University of Arizona, Tucson, Arizona 85721
| | - Jiming Jiang
- Department of Horticulture, University of Wisconsin-Madison, Madison, Wisconsin 53706
- Department of Plant Biology, Department of Horticulture, Michigan State University, East Lansing, Michigan 48824
| | - Mingsheng Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
246
|
Moyers BA, Zhang J. Toward Reducing Phylostratigraphic Errors and Biases. Genome Biol Evol 2018; 10:2037-2048. [PMID: 30060201 PMCID: PMC6105108 DOI: 10.1093/gbe/evy161] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/28/2018] [Indexed: 01/03/2023] Open
Abstract
Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings.
Collapse
Affiliation(s)
- Bryan A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
247
|
Abstract
De novo genes are very important for evolutionary innovation. However, how these genes originate and spread remains largely unknown. To better understand this, we rigorously searched for de novo genes in Saccharomyces cerevisiae S288C and examined their spread and fixation in the population. Here, we identified 84 de novo genes in S. cerevisiae S288C since the divergence with their sister groups. Transcriptome and ribosome profiling data revealed at least 8 (10%) and 28 (33%) de novo genes being expressed and translated only under specific conditions, respectively. DNA microarray data, based on 2-fold change, showed that 87% of the de novo genes are regulated during various biological processes, such as nutrient utilization and sporulation. Our comparative and evolutionary analyses further revealed that some factors, including single nucleotide polymorphism (SNP)/indel mutation, high GC content, and DNA shuffling, contribute to the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we also provide evidence suggesting the possible parallel origin of a de novo gene between S. cerevisiae and Saccharomyces paradoxus. Together, our study provides several new insights into the origin and spread of de novo genes. Emergence of de novo genes has occurred in many lineages during evolution, but the birth, spread, and function of these genes remain unresolved. Here we have searched for de novo genes from Saccharomyces cerevisiae S288C using rigorous methods, which reduced the effects of bad annotation and genomic gaps on the identification of de novo genes. Through this analysis, we have found 84 new genes originating de novo from previously noncoding regions, 87% of which are very likely involved in various biological processes. We noticed that 10% and 33% of de novo genes were only expressed and translated under specific conditions, therefore, verification of de novo genes through transcriptome and ribosome profiling, especially from limited expression data, may underestimate the number of bona fide new genes. We further show that SNP/indel mutation, high GC content, and DNA shuffling could be involved in the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we provide evidence suggesting the possible parallel origin of a new gene.
Collapse
|
248
|
Li Z, Wan X. Long-term evolutionary DNA methylation dynamic of protein-coding genes and its underlying mechanism. Gene 2018; 677:96-104. [PMID: 30031907 DOI: 10.1016/j.gene.2018.07.051] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 07/05/2018] [Accepted: 07/18/2018] [Indexed: 10/28/2022]
Abstract
DNA methylation is an important type of epigenetic modifications for the maintenance of genome functionality and stability. Although there are many studies on DNA methylation patterns, mechanisms, and functions, no study has focused on the evolutionary dynamic of DNA methylation. Here, we present the first genome-wide pattern of evolutionary DNA methylation dynamic in protein-coding genes, by grouping the Arabidopsis thaliana protein-coding genes into several conservation levels representing different evolutionary ages, and by investigating their DNA methylation features for three methylation contexts in both genic and flanking regions. The main results include: in a long-term evolutionary period, (1) genic CHG and CHH methylation levels tend to be decreased over time, which is mainly due to the reductions in the number of siRNA target sites in genes; (2) genic CG methylation levels are firstly reduced and then increased on average over evolutionary time, which is the interactional result of increased proportion and decreased CG methylation level of CG methylated genes; and (3) increased gene length and the stochastic methylation mechanism in CG context may further account for genic CG methylation trend in evolution. The diverse DNA methylation mechanisms in different contexts, together with altered gene length in evolution, could interpret the methylation dynamic of protein-coding genes over evolutionary time. This evolutionary perspective provides a dynamic understanding of the intrinsic relationship between DNA methylation and its functional and evolutionary effects on the genomes.
Collapse
Affiliation(s)
- Ziwen Li
- Biology and Agriculture Research Center, School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing 100024, China; Beijing Engineering Laboratory of Main Crop Bio-Tech Breeding, Beijing International Science and Technology Cooperation Base of Biotechnology Breeding, Beijing Solidwill Sci-Tech Co. Ltd., Beijing 100192, China
| | - Xiangyuan Wan
- Biology and Agriculture Research Center, School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing 100024, China; Beijing Engineering Laboratory of Main Crop Bio-Tech Breeding, Beijing International Science and Technology Cooperation Base of Biotechnology Breeding, Beijing Solidwill Sci-Tech Co. Ltd., Beijing 100192, China.
| |
Collapse
|
249
|
Jiang M, Dong X, Lang H, Pang W, Zhan Z, Li X, Piao Z. Mining of Brassica-Specific Genes (BSGs) and Their Induction in Different Developmental Stages and under Plasmodiophora brassicae Stress in Brassica rapa. Int J Mol Sci 2018; 19:ijms19072064. [PMID: 30012965 PMCID: PMC6073354 DOI: 10.3390/ijms19072064] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 06/29/2018] [Accepted: 07/13/2018] [Indexed: 11/16/2022] Open
Abstract
Orphan genes, also called lineage-specific genes (LSGs), are important for responses to biotic and abiotic stresses, and are associated with lineage-specific structures and biological functions. To date, there have been no studies investigating gene number, gene features, or gene expression patterns of orphan genes in Brassica rapa. In this study, 1540 Brassica-specific genes (BSGs) and 1824 Cruciferae-specific genes (CSGs) were identified based on the genome of Brassica rapa. The genic features analysis indicated that BSGs and CSGs possessed a lower percentage of multi-exon genes, higher GC content, and shorter gene length than evolutionary-conserved genes (ECGs). In addition, five types of BSGs were obtained and 145 out of 529 real A subgenome-specific BSGs were verified by PCR in 51 species. In silico and semi-qPCR, gene expression analysis of BSGs suggested that BSGs are expressed in various tissue and can be induced by Plasmodiophora brassicae. Moreover, an A/C subgenome-specific BSG, BSGs1, was specifically expressed during the heading stage, indicating that the gene might be associated with leafy head formation. Our results provide valuable biological information for studying the molecular function of BSGs for Brassica-specific phenotypes and biotic stress in B. rapa.
Collapse
Affiliation(s)
- Mingliang Jiang
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Xiangshu Dong
- School of Agriculture, Yunnan University, Kunming 650504, China.
| | - Hong Lang
- Key Laboratory of Northeast Rice Biology and Breeding, Ministry of Agriculture, Rice Research Institute, Shenyang Agricultural University, Shenyang 110866, China.
| | - Wenxing Pang
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Zongxiang Zhan
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Xiaonan Li
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Zhongyun Piao
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| |
Collapse
|
250
|
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J 2018; 285:2605-2625. [PMID: 29802682 DOI: 10.1111/febs.14504] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 12/11/2022]
Abstract
Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| | - Tristan Bitard-Feildel
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| |
Collapse
|