1
|
Klimovich A, Bosch TCG. Novel technologies uncover novel 'anti'-microbial peptides in Hydra shaping the species-specific microbiome. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230058. [PMID: 38497265 PMCID: PMC10945409 DOI: 10.1098/rstb.2023.0058] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/16/2023] [Indexed: 03/19/2024] Open
Abstract
The freshwater polyp Hydra uses an elaborate innate immune machinery to maintain its specific microbiome. Major components of this toolkit are conserved Toll-like receptor (TLR)-mediated immune pathways and species-specific antimicrobial peptides (AMPs). Our study harnesses advanced technologies, such as high-throughput sequencing and machine learning, to uncover a high complexity of the Hydra's AMPs repertoire. Functional analysis reveals that these AMPs are specific against diverse members of the Hydra microbiome and expressed in a spatially controlled pattern. Notably, in the outer epithelial layer, AMPs are produced mainly in the neurons. The neuron-derived AMPs are secreted directly into the glycocalyx, the habitat for symbiotic bacteria, and display high selectivity and spatial restriction of expression. In the endodermal layer, in contrast, endodermal epithelial cells produce an abundance of different AMPs including members of the arminin and hydramacin families, while gland cells secrete kazal-type protease inhibitors. Since the endodermal layer lines the gastric cavity devoid of symbiotic bacteria, we assume that endodermally secreted AMPs protect the gastric cavity from intruding pathogens. In conclusion, Hydra employs a complex set of AMPs expressed in distinct tissue layers and cell types to combat pathogens and to maintain a stable spatially organized microbiome. This article is part of the theme issue 'Sculpting the microbiome: how host factors determine and respond to microbial colonization'.
Collapse
Affiliation(s)
- Alexander Klimovich
- Zoological Institute, Christian-Albrechts University of Kiel, Am Botanischen Garten 1-9, Kiel 24118, Germany
| | - Thomas C. G. Bosch
- Zoological Institute, Christian-Albrechts University of Kiel, Am Botanischen Garten 1-9, Kiel 24118, Germany
| |
Collapse
|
2
|
Domazet-Lošo M, Široki T, Šimičević K, Domazet-Lošo T. Macroevolutionary dynamics of gene family gain and loss along multicellular eukaryotic lineages. Nat Commun 2024; 15:2663. [PMID: 38531970 DOI: 10.1038/s41467-024-47017-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
The gain and loss of genes fluctuate over evolutionary time in major eukaryotic clades. However, the full profile of these macroevolutionary trajectories is still missing. To give a more inclusive view on the changes in genome complexity across the tree of life, here we recovered the evolutionary dynamics of gene family gain and loss ranging from the ancestor of cellular organisms to 352 eukaryotic species. We show that in all considered lineages the gene family content follows a common evolutionary pattern, where the number of gene families reaches the highest value at a major evolutionary and ecological transition, and then gradually decreases towards extant organisms. This supports theoretical predictions and suggests that the genome complexity is often decoupled from commonly perceived organismal complexity. We conclude that simplification by gene family loss is a dominant force in Phanerozoic genomes of various lineages, probably underpinned by intense ecological specializations and functional outsourcing.
Collapse
Affiliation(s)
- Mirjana Domazet-Lošo
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia.
| | - Tin Široki
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Korina Šimičević
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
- School of Medicine, Catholic University of Croatia, Ilica 242, HR-10000, Zagreb, Croatia.
| |
Collapse
|
3
|
Fleck K, Luria V, Garag N, Karger A, Hunter T, Marten D, Phu W, Nam KM, Sestan N, O’Donnell-Luria AH, Erceg J. Functional associations of evolutionarily recent human genes exhibit sensitivity to the 3D genome landscape and disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.17.585403. [PMID: 38559085 PMCID: PMC10980080 DOI: 10.1101/2024.03.17.585403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genome organization is intricately tied to regulating genes and associated cell fate decisions. In this study, we examine the positioning and functional significance of human genes, grouped by their evolutionary age, within the 3D organization of the genome. We reveal that genes of different evolutionary origin have distinct positioning relationships with both domains and loop anchors, and remarkably consistent relationships with boundaries across cell types. While the functional associations of each group of genes are primarily cell type-specific, such associations of conserved genes maintain greater stability across 3D genomic features and disease than recently evolved genes. Furthermore, the expression of these genes across various tissues follows an evolutionary progression, such that RNA levels increase from young genes to ancient genes. Thus, the distinct relationships of gene evolutionary age, function, and positioning within 3D genomic features contribute to tissue-specific gene regulation in development and disease.
Collapse
Affiliation(s)
- Katherine Fleck
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269
| | - Victor Luria
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| | - Nitanta Garag
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
| | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA 02115
| | - Trevor Hunter
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
| | - Daniel Marten
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
| | - William Phu
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
| | - Kee-Myoung Nam
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06510
| | - Nenad Sestan
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510
| | - Anne H. O’Donnell-Luria
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Jelena Erceg
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030
| |
Collapse
|
4
|
Álvarez-Lugo A, Becerra A. The Fate of Duplicated Enzymes in Prokaryotes: The Case of Isomerases. J Mol Evol 2023; 91:76-92. [PMID: 36580111 DOI: 10.1007/s00239-022-10085-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 12/16/2022] [Indexed: 12/30/2022]
Abstract
The isomerases are a unique enzymatic class of enzymes that carry out a great diversity of chemical reactions at the intramolecular level. This class comprises about 300 members, most of which are involved in carbohydrate and terpenoid/polyketide metabolism. Along with oxidoreductases and translocases, isomerases are one of the classes with the highest ratio of paralogous enzymes. Due to its relatively small number of members, it is plausible to explore it in greater detail to identify specific cases of gene duplication. Here, we present an analysis at the level of individual isomerases and identify different members that seem to be involved in duplication events in prokaryotes. As was suggested in a previous study, there is no homogeneous distribution of paralogs, but rather they accumulate into a few subcategories, some of which differ between Archaea and Bacteria. As expected, the metabolic processes with more paralogous isomerases have to do with carbohydrate metabolism but also with RNA modification (a particular case involving an rRNA-modifying isomerase is thoroughly discussed and analyzed in detail). Overall, our findings suggest that the most common fate for paralogous enzymes is the retention of the original enzymatic function, either associated with a dosage effect or with differential expression in response to changing environments, followed by subfunctionalization and, to a much lesser degree, neofunctionalization, which is consistent with what has been reported elsewhere.
Collapse
Affiliation(s)
- Alejandro Álvarez-Lugo
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Mexico City, México.,Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, México
| | - Arturo Becerra
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, México.
| |
Collapse
|
5
|
Moutinho AF, Eyre-Walker A, Dutheil JY. Strong evidence for the adaptive walk model of gene evolution in Drosophila and Arabidopsis. PLoS Biol 2022; 20:e3001775. [PMID: 36099311 PMCID: PMC9470001 DOI: 10.1371/journal.pbio.3001775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 08/01/2022] [Indexed: 11/19/2022] Open
Abstract
Understanding the dynamics of species adaptation to their environments has long been a central focus of the study of evolution. Theories of adaptation propose that populations evolve by “walking” in a fitness landscape. This “adaptive walk” is characterised by a pattern of diminishing returns, where populations further away from their fitness optimum take larger steps than those closer to their optimal conditions. Hence, we expect young genes to evolve faster and experience mutations with stronger fitness effects than older genes because they are further away from their fitness optimum. Testing this hypothesis, however, constitutes an arduous task. Young genes are small, encode proteins with a higher degree of intrinsic disorder, are expressed at lower levels, and are involved in species-specific adaptations. Since all these factors lead to increased protein evolutionary rates, they could be masking the effect of gene age. While controlling for these factors, we used population genomic data sets of Arabidopsis and Drosophila and estimated the rate of adaptive substitutions across genes from different phylostrata. We found that a gene’s evolutionary age significantly impacts the molecular rate of adaptation. Moreover, we observed that substitutions in young genes tend to have larger physicochemical effects. Our study, therefore, provides strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale. This study uses population genomic datasets from Arabidopsis and Drosophila to show that young genes adapt faster and are subject to mutations of larger fitness effects, providing strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale.
Collapse
Affiliation(s)
- Ana Filipa Moutinho
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail:
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Julien Y. Dutheil
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Unité Mixte de Recherche 5554 Institut des Sciences de l’Evolution, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| |
Collapse
|
6
|
Heinen T, Xie C, Keshavarz M, Stappert D, Künzel S, Tautz D. Evolution of a New Testis-Specific Functional Promoter Within the Highly Conserved Map2k7 Gene of the Mouse. Front Genet 2022; 12:812139. [PMID: 35069705 PMCID: PMC8766832 DOI: 10.3389/fgene.2021.812139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/08/2021] [Indexed: 12/03/2022] Open
Abstract
Map2k7 (synonym Mkk7) is a conserved regulatory kinase gene and a central component of the JNK signaling cascade with key functions during cellular differentiation. It shows complex transcription patterns, and different transcript isoforms are known in the mouse (Mus musculus). We have previously identified a newly evolved testis-specific transcript for the Map2k7 gene in the subspecies M. m. domesticus. Here, we identify the new promoter that drives this transcript and find that it codes for an open reading frame (ORF) of 50 amino acids. The new promoter was gained in the stem lineage of closely related mouse species but was secondarily lost in the subspecies M. m. musculus and M. m. castaneus. A single mutation can be correlated with its transcriptional activity in M. m. domesticus, and cell culture assays demonstrate the capability of this mutation to drive expression. A mouse knockout line in which the promoter region of the new transcript is deleted reveals a functional contribution of the newly evolved promoter to sperm motility and the spermatid transcriptome. Our data show that a new functional transcript (and possibly protein) can evolve within an otherwise highly conserved gene, supporting the notion of regulatory changes contributing to the emergence of evolutionary novelties.
Collapse
Affiliation(s)
| | - Chen Xie
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Max-Plank Institute for Evolutionary Biology, Plön, Germany.,Deutsches Zentrum für Neurodegenerative Erkrankungen e. V. (DZNE), Bonn, Germany
| | - Dominik Stappert
- Deutsches Zentrum für Neurodegenerative Erkrankungen e. V. (DZNE), Bonn, Germany
| | - Sven Künzel
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
7
|
Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel) 2021; 12:1913. [PMID: 34946861 PMCID: PMC8702183 DOI: 10.3390/genes12121913] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
We study the potential for the de novo evolution of genes from random nucleotide sequences using libraries of E. coli expressing random sequence peptides. We assess the effects of such peptides on cell growth by monitoring frequency changes in individual clones in a complex library through four serial passages. Using a new analysis pipeline that allows the tracing of peptides of all lengths, we find that over half of the peptides have consistent effects on cell growth. Across nine different experiments, around 16% of clones increase in frequency and 36% decrease, with some variation between individual experiments. Shorter peptides (8-20 residues), are more likely to increase in frequency, longer ones are more likely to decrease. GC content, amino acid composition, intrinsic disorder, and aggregation propensity show slightly different patterns between peptide groups. Sequences that increase in frequency tend to be more disordered with lower aggregation propensity. This coincides with the observation that young genes with more disordered structures are better tolerated in genomes. Our data indicate that random sequences can be a source of evolutionary innovation, since a large fraction of them are well tolerated by the cells or can provide a growth advantage.
Collapse
Affiliation(s)
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology, August-Thienemann Strasse 2, 24306 Plön, Germany;
| |
Collapse
|
8
|
Goymann W, Schwabl H. The tyranny of phylogeny-A plea for a less dogmatic stance on two-species comparisons: Funding bodies, journals and referees discourage two- or few-species comparisons, but such studies provide essential insights complementary to phylogenetic comparative studies. Bioessays 2021; 43:e2100071. [PMID: 34155665 DOI: 10.1002/bies.202100071] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 06/04/2021] [Accepted: 06/08/2021] [Indexed: 11/11/2022]
Abstract
Phylogenetically controlled studies across multiple species correct for taxonomic confounds in physiological performance traits. Therefore, they are preferred over comparisons of two or few closely-related species. Funding bodies, referees and journal editors nowadays often even reject to consider detailed comparisons of two or few closely related species. Here, we plea for a less dogmatic stance on such comparisons, because phylogenetic studies come with their own limitations similar in magnitude as those of two-species comparisons. Two-species comparisons are particularly relevant and instructive for understanding physiological pathways and de novo mutations in three contexts: in a purely mechanistic context, when differences in the regulation of a trait are the focus of investigation, when a physiological trait lacks a direct connection to fitness, and when physiological measures cannot easily be standardized among laboratories. In conclusion, phylogenetic comparative and two-species studies have different strengths and weaknesses and combining these complementary approaches will help integrating biology.
Collapse
Affiliation(s)
- Wolfgang Goymann
- Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Hubert Schwabl
- School of Biological Sciences, Washington State University, Pullman, Washington, USA
| |
Collapse
|
9
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Zhang W, Tautz D. Dedicated transcriptomics combined with power analysis lead to functional understanding of genes with weak phenotypic changes in knockout lines. PLoS Comput Biol 2020; 16:e1008354. [PMID: 33180766 PMCID: PMC7685438 DOI: 10.1371/journal.pcbi.1008354] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 11/24/2020] [Accepted: 09/20/2020] [Indexed: 12/26/2022] Open
Abstract
Systematic knockout studies in mice have shown that a large fraction of the gene replacements show no lethal or other overt phenotypes. This has led to the development of more refined analysis schemes, including physiological, behavioral, developmental and cytological tests. However, transcriptomic analyses have not yet been systematically evaluated for non-lethal knockouts. We conducted a power analysis to determine the experimental conditions under which even small changes in transcript levels can be reliably traced. We have applied this to two gene disruption lines of genes for which no function was known so far. Dedicated phenotyping tests informed by the tissues and stages of highest expression of the two genes show small effects on the tested phenotypes. For the transcriptome analysis of these stages and tissues, we used a prior power analysis to determine the number of biological replicates and the sequencing depth. We find that under these conditions, the knockouts have a significant impact on the transcriptional networks, with thousands of genes showing small transcriptional changes. GO analysis suggests that A930004D18Rik is involved in developmental processes through contributing to protein complexes, and A830005F24Rik in extracellular matrix functions. Subsampling analysis of the data reveals that the increase in the number of biological replicates was more important that increasing the sequencing depth to arrive at these results. Hence, our proof-of-principle experiment suggests that transcriptomic analysis is indeed an option to study gene functions of genes with weak or no traceable phenotypic effects and it provides the boundary conditions under which this is possible. Knockout mice benefit the understanding of gene functions in mammals. However, it has proven difficult for many genes to identify clear phenotypes, related due to lack of sufficient assays. As Lewis Wolpert put it in a famous quote “But did you take them to the opera?”, thus metaphorically alluding to the need to extend phenotyping efforts. This insight led to the establishment of phenotyping pipelines that are nowadays routinely used to characterize knock-out lines. However, transcriptomic approaches based on RNA-Seq have been much less explored for such deep-level studies. We conducted here both, a theoretical power analysis and practical RNA-Seq experiments on two knockout lines with small phenotypic effects to investigate the parameters including sample size, sequencing depth, fold change, and dispersion. Our dedicated RNA-Seq studies discovered thousands of genes with small transcriptional changes and enriched in specific functions in both knockout lines. We find that it is more important to increase the number of samples than to increase the sequencing depth. Our work shows that a deep RNA-Seq study on knockouts is powerful for understanding gene functions in cases of weak phenotypic effects, and provides a guideline for the experimental design of such studies.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- * E-mail:
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian K. Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Wenyu Zhang
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
10
|
Abstract
New species arise as the genomes of populations diverge. The developmental 'alarm clock' of speciation sounds off when sufficient divergence in genetic control of development leads hybrid individuals to infertility or inviability, the world awoken to the dawn of new species with intrinsic post-zygotic reproductive isolation. Some developmental stages will be more prone to hybrid dysfunction due to how molecular evolution interacts with the ontogenetic timing of gene expression. Considering the ontogeny of hybrid incompatibilities provides a profitable connection between 'evo-devo' and speciation genetics to better link macroevolutionary pattern, microevolutionary process, and molecular mechanisms. Here, we explore speciation alongside development, emphasizing their mutual dependence on genetic network features, fitness landscapes, and developmental system drift. We assess models for how ontogenetic timing of reproductive isolation can be predictable. Experiments and theory within this synthetic perspective can help identify new rules of speciation as well as rules in the molecular evolution of development.
Collapse
Affiliation(s)
- Asher D Cutter
- Department of Ecology & Evolutionary Biology, University of TorontoTorontoCanada
| | - Joanna D Bundus
- Department of Integrative Biology, University of Wisconsin – MadisonMadisonUnited States
| |
Collapse
|
11
|
Combination of Proteogenomics with Peptide De Novo Sequencing Identifies New Genes and Hidden Posttranscriptional Modifications. mBio 2019; 10:mBio.02367-19. [PMID: 31615963 PMCID: PMC6794485 DOI: 10.1128/mbio.02367-19] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Next-generation sequencing techniques have considerably increased the number of completely sequenced eukaryotic genomes. These genomes are mostly automatically annotated, and ab initio gene prediction is commonly combined with homology-based search approaches and often supported by transcriptomic data. The latter in particular improve the prediction of intron splice sites and untranslated regions. However, correct prediction of translation initiation sites (TIS), alternative splice junctions, and protein-coding potential remains challenging. Here, we present an advanced proteogenomics approach, namely, the combination of proteogenomics and de novo peptide sequencing analysis, in conjunction with Blast2GO and phylostratigraphy. Using the model fungus Sordaria macrospora as an example, we provide a comprehensive view of the proteome that not only increases the functional understanding of this multicellular organism at different developmental stages but also immensely enhances the genome annotation quality. Proteogenomics combines proteomics, genomics, and transcriptomics and has considerably improved genome annotation in poorly investigated phylogenetic groups for which homology information is lacking. Furthermore, it can be advantageous when reinvestigating well-annotated genomes. Here, we applied an advanced proteogenomics approach, combining standard proteogenomics with peptide de novo sequencing, to refine annotation of the well-studied model fungus Sordaria macrospora. We investigated samples from different developmental and physiological conditions, resulting in the detection of 104 so-far hidden proteins and annotation changes in 575 genes, including 389 splice site refinements. Significantly, our approach provides peptide-level evidence for 113 single-amino-acid variations and 15 C-terminal protein elongations originating from A-to-I RNA editing, a phenomenon recently detected in fungi. Coexpression and phylostratigraphic analysis of the refined proteome suggest that new functions in evolutionarily young genes correlate with distinct developmental stages. In conclusion, our advanced proteogenomics approach supports and promotes functional studies of fungal model systems.
Collapse
|
12
|
In-depth analysis of Bacillus subtilis proteome identifies new ORFs and traces the evolutionary history of modified proteins. Sci Rep 2018; 8:17246. [PMID: 30467398 PMCID: PMC6250715 DOI: 10.1038/s41598-018-35589-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/07/2018] [Indexed: 01/05/2023] Open
Abstract
Bacillus subtilis is a sporulating Gram-positive bacterium widely used in basic research and biotechnology. Despite being one of the best-characterized bacterial model organism, recent proteomics studies identified only about 50% of its theoretical protein count. Here we combined several hundred MS measurements to obtain a comprehensive map of the proteome, phosphoproteome and acetylome of B. subtilis grown at 37 °C in minimal medium. We covered 75% of the theoretical proteome (3,159 proteins), detected 1,085 phosphorylation and 4,893 lysine acetylation sites and performed a systematic bioinformatic characterization of the obtained data. A subset of analyzed MS files allowed us to reconstruct a network of Hanks-type protein kinases, Ser/Thr/Tyr phosphatases and their substrates. We applied genomic phylostratigraphy to gauge the evolutionary age of B. subtilis protein classes and revealed that protein modifications were present on the oldest bacterial proteins. Finally, we performed a proteogenomic analysis by mapping all MS spectra onto a six-frame translation of B. subtilis genome and found evidence for 19 novel ORFs. We provide the most extensive overview of the proteome and post-translational modifications for B. subtilis to date, with insights into functional annotation and evolutionary aspects of the B. subtilis genome.
Collapse
|
13
|
Bekpen C, Xie C, Tautz D. Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences. BMC Evol Biol 2018; 18:121. [PMID: 30075701 PMCID: PMC6091031 DOI: 10.1186/s12862-018-1232-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 07/16/2018] [Indexed: 12/26/2022] Open
Abstract
Background The adaptive immune system of vertebrates has an extraordinary potential to sense and neutralize foreign antigens entering the body. De novo evolution of genes implies that the genome itself expresses novel antigens from intergenic sequences which could cause a problem with this immune system. Peptides from these novel proteins could be presented by the major histocompatibility complex (MHC) receptors to the cell surface and would be recognized as foreign. The respective cells would then be attacked and destroyed, or would cause inflammatory responses. Hence, de novo expressed peptides have to be introduced to the immune system as being self-peptides to avoid such autoimmune reactions. The regulation of the distinction between self and non-self starts during embryonic development, but continues late into adulthood. It is mostly mediated by specialized cells in the thymus, but can also be conveyed in peripheral tissues, such as the lymph nodes and the spleen. The self-antigens need to be exposed to the reactive T-cells, which requires the expression of the genes in the respective tissues. Since the initial activation of a promotor for new intergenic transcription of a de novo gene could occur in any tissue, we should expect that the evolutionary establishment of a de novo gene in animals with an adaptive immune system should also involve expression in at least one of the tissues that confer self-recognition. Results We have studied this question by analyzing the transcriptomes of multiple tissues from young mice in three closely related natural populations of the house mouse (M. m. domesticus). We find that new intergenic transcription occurs indeed mostly in only a single tissue. When a second tissue becomes involved, thymus and spleen are significantly overrepresented. Conclusions We conclude that the inclusion of de novo transcripts in the processes for the induction of self-tolerance is indeed an important step in the evolution of functional de novo genes in vertebrates. Electronic supplementary material The online version of this article (10.1186/s12862-018-1232-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cemalettin Bekpen
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Chen Xie
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany.
| |
Collapse
|
14
|
Li Z, Wan X. Long-term evolutionary DNA methylation dynamic of protein-coding genes and its underlying mechanism. Gene 2018; 677:96-104. [PMID: 30031907 DOI: 10.1016/j.gene.2018.07.051] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 07/05/2018] [Accepted: 07/18/2018] [Indexed: 10/28/2022]
Abstract
DNA methylation is an important type of epigenetic modifications for the maintenance of genome functionality and stability. Although there are many studies on DNA methylation patterns, mechanisms, and functions, no study has focused on the evolutionary dynamic of DNA methylation. Here, we present the first genome-wide pattern of evolutionary DNA methylation dynamic in protein-coding genes, by grouping the Arabidopsis thaliana protein-coding genes into several conservation levels representing different evolutionary ages, and by investigating their DNA methylation features for three methylation contexts in both genic and flanking regions. The main results include: in a long-term evolutionary period, (1) genic CHG and CHH methylation levels tend to be decreased over time, which is mainly due to the reductions in the number of siRNA target sites in genes; (2) genic CG methylation levels are firstly reduced and then increased on average over evolutionary time, which is the interactional result of increased proportion and decreased CG methylation level of CG methylated genes; and (3) increased gene length and the stochastic methylation mechanism in CG context may further account for genic CG methylation trend in evolution. The diverse DNA methylation mechanisms in different contexts, together with altered gene length in evolution, could interpret the methylation dynamic of protein-coding genes over evolutionary time. This evolutionary perspective provides a dynamic understanding of the intrinsic relationship between DNA methylation and its functional and evolutionary effects on the genomes.
Collapse
Affiliation(s)
- Ziwen Li
- Biology and Agriculture Research Center, School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing 100024, China; Beijing Engineering Laboratory of Main Crop Bio-Tech Breeding, Beijing International Science and Technology Cooperation Base of Biotechnology Breeding, Beijing Solidwill Sci-Tech Co. Ltd., Beijing 100192, China
| | - Xiangyuan Wan
- Biology and Agriculture Research Center, School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing 100024, China; Beijing Engineering Laboratory of Main Crop Bio-Tech Breeding, Beijing International Science and Technology Cooperation Base of Biotechnology Breeding, Beijing Solidwill Sci-Tech Co. Ltd., Beijing 100192, China.
| |
Collapse
|
15
|
Banerjee S, Chakraborty S. Protein intrinsic disorder negatively associates with gene age in different eukaryotic lineages. MOLECULAR BIOSYSTEMS 2018; 13:2044-2055. [PMID: 28783193 DOI: 10.1039/c7mb00230k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The emergence of new protein-coding genes in a specific lineage or species provides raw materials for evolutionary adaptations. Until recently, the biology of new genes emerging particularly from non-genic sequences remained unexplored. Although the new genes are subjected to variable selection pressure and face rapid deletion, some of them become functional and are retained in the gene pool. To acquire functional novelties, new genes often get integrated into the pre-existing ancestral networks. However, the mechanism by which young proteins acquire novel interactions remains unanswered till date. Since structural orientation contributes hugely to the mode of proteins' physical interactions, in this regard, we put forward an interesting question - Do new genes encode proteins with stable folds? Addressing the question, we demonstrated that the intrinsic disorder inversely correlates with the evolutionary gene ages - i.e. young proteins are richer in intrinsic disorder than the ancient ones. We further noted that young proteins, which are initially poorly connected hubs, prefer to be structurally more disordered than well-connected ancient proteins. The phenomenon strikingly defies the usual trend of well-connected proteins being highly disordered in structure. We justified that structural disorder might help poorly connected young proteins to undergo promiscuous interactions, which provides the foundation for novel protein interactions. The study focuses on the evolutionary perspectives of young proteins in the light of structural adaptations.
Collapse
Affiliation(s)
- Sanghita Banerjee
- Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata 700108, India.
| | | |
Collapse
|
16
|
Gambetta GA, Matthews MA, Syvanen M. The Xylella fastidosa RTX operons: evidence for the evolution of protein mosaics through novel genetic exchanges. BMC Genomics 2018; 19:329. [PMID: 29728072 PMCID: PMC5935956 DOI: 10.1186/s12864-018-4731-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 04/26/2018] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Xylella fastidiosa (Xf) is a gram negative bacterium inhabiting the plant vascular system. In most species this bacterium lives as a benign symbiote, but in several agriculturally important plants (e.g. coffee, citrus, grapevine) Xf is pathogenic. Xf has four loci encoding homologues to hemolysin RTX proteins, virulence factors involved in a wide range of plant pathogen interactions. RESULTS We show that all four genes are expressed during pathogenesis in grapevine. The sequences from these four genes have a complex repetitive structure. At the C-termini, sequence diversity between strains is what would be expected from orthologous genes. However, within strains there is no N-terminal homology, indicating these loci encode RTXs of different functions and/or specificities. More striking is that many of the orthologous loci between strains share this extreme variation at the N-termini. Thus these RTX orthologues are most easily visualized as fusions between the orthologous C-termini and different N-termini. Further, the four genes are found in operons having a peculiar structure with an extensively duplicated module encoding a small protein with homology to the N-terminal region of the full length RTX. Surprisingly, some of these small peptides are most similar not to their corresponding full length RTX, but to the N-termini of RTXs from other Xf strains, and even other remotely related species. CONCLUSIONS These results demonstrate that these genes are expressed in planta during pathogenesis. Their structure suggests extensive evolutionary restructuring through horizontal gene transfers and heterologous recombination mechanisms. The sum of the evidence suggests these repetitive modules are a novel kind of mobile genetic element.
Collapse
Affiliation(s)
- Gregory A Gambetta
- Bordeaux Science Agro, Institut des Sciences de la Vigne et du Vin, Ecophysiologie et Génomique Fonctionnelle de la Vigne, UMR 1287, F- 33140, Villenave d'Ornon, France.
| | - Mark A Matthews
- Department of Viticulture and Enology, University of California, Davis, CA, 95616-8645, USA
| | - Michael Syvanen
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA, 95616-8645, USA
| |
Collapse
|
17
|
Pang Y, Mao C, Liu S. Encoding activities of non-coding RNAs. Am J Cancer Res 2018; 8:2496-2507. [PMID: 29721095 PMCID: PMC5928905 DOI: 10.7150/thno.24677] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2018] [Accepted: 02/25/2018] [Indexed: 12/14/2022] Open
Abstract
The universal expression of various non-coding RNAs (ncRNAs) is now considered the main feature of organisms' genomes. Many regions in the genome are transcribed but not annotated to encode proteins, yet contain small open reading frames (smORFs). A widely accepted opinion is that a vast majority of ncRNAs are not further translated. However, increasing evidence underlines a series of intriguing translational events from the ncRNAs, which were previously considered to lack coding potential. Recent studies also suggest that products derived from such novel translational events display important regulatory functions in many fundamental biological and pathological processes. Here we give a critical review on the potential coding capacity of ncRNAs, in particular, about what is known and unknown in this emerging area. We also discuss the possible underlying coding mechanisms of these extraordinary ncRNAs and possible roles of peptides or proteins derived from the ncRNAs in disease development and theranostics. Our review offers an extensive resource for studying the biology of ncRNAs and sheds light into the use of ncRNAs and their corresponding peptides or proteins for disease diagnosis and therapy.
Collapse
|
18
|
Pezer Ž, Chung AG, Karn RC, Laukaitis CM. Analysis of Copy Number Variation in the Abp Gene Regions of Two House Mouse Subspecies Suggests Divergence during the Gene Family Expansions. Genome Biol Evol 2018; 9:3858091. [PMID: 28575204 PMCID: PMC5513543 DOI: 10.1093/gbe/evx099] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2017] [Indexed: 12/26/2022] Open
Abstract
The Androgen-binding protein (Abp) gene region of the mouse genome contains 64 genes, some encoding pheromones that influence assortative mating between mice from different subspecies. Using CNVnator and quantitative PCR, we explored copy number variation in this gene family in natural populations of Mus musculus domesticus (Mmd) and Mus musculus musculus (Mmm), two subspecies of house mice that form a narrow hybrid zone in Central Europe. We found that copy number variation in the center of the Abp gene region is very common in wild Mmd, primarily representing the presence/absence of the final duplications described for the mouse genome. Clustering of Mmd individuals based on this variation did not reflect their geographical origin, suggesting no population divergence in the Abp gene cluster. However, copy number variation patterns differ substantially between Mmd and other mouse taxa. Large blocks of Abp genes are absent in Mmm, Mus musculus castaneus and an outgroup, Mus spretus, although with differences in variation and breakpoint locations. Our analysis calls into question the reliance on a reference genome for interpreting the detailed organization of genes in taxa more distant from the Mmd reference genome. The polymorphic nature of the gene family expansion in all four taxa suggests that the number of Abp genes, especially in the central gene region, is not critical to the survival and reproduction of the mouse. However, Abp haplotypes of variable length may serve as a source of raw genetic material for new signals influencing reproductive communication and thus speciation of mice.
Collapse
Affiliation(s)
- Željka Pezer
- Max Planck Institute for Evolutionary Biology, Plön, Germany.,Ruđer Bošković Institute, Zagreb, Croatia
| | - Amanda G Chung
- Department of Medicine, College of Medicine, University of Arizona
| | - Robert C Karn
- Department of Medicine, College of Medicine, University of Arizona
| | | |
Collapse
|
19
|
Lei L, Steffen JG, Osborne EJ, Toomajian C. Plant organ evolution revealed by phylotranscriptomics in Arabidopsis thaliana. Sci Rep 2017; 7:7567. [PMID: 28790409 PMCID: PMC5548721 DOI: 10.1038/s41598-017-07866-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 07/04/2017] [Indexed: 11/18/2022] Open
Abstract
The evolution of phenotypes occurs through changes both in protein sequence and gene expression levels. Though much of plant morphological evolution can be explained by changes in gene expression, examining its evolution has challenges. To gain a new perspective on organ evolution in plants, we applied a phylotranscriptomics approach. We combined a phylostratigraphic approach with gene expression based on the strand-specific RNA-seq data from seedling, floral bud, and root of 19 Arabidopsis thaliana accessions to examine the age and sequence divergence of transcriptomes from these organs and how they adapted over time. Our results indicate that, among the sense and antisense transcriptomes of these organs, the sense transcriptomes of seedlings are the evolutionarily oldest across all accessions and are the most conserved in amino acid sequence for most accessions. In contrast, among the sense transcriptomes from these same organs, those from floral bud are evolutionarily youngest and least conserved in sequence for most accessions. Different organs have adaptive peaks at different stages in their evolutionary history; however, all three show a common adaptive signal from the Magnoliophyta to Brassicale stage. Our research highlights how phylotranscriptomic analyses can be used to trace organ evolution in the deep history of plant species.
Collapse
Affiliation(s)
- Li Lei
- Kansas State University, Department of Plant Pathology, Manhattan, KS, 66506, USA.
| | - Joshua G Steffen
- Colby-Sawyer College, Natural Sciences Department, New London, NH, 03257, USA
| | - Edward J Osborne
- University of Utah, Department of Biology, Salt Lake City, UT, 84111, USA
| | | |
Collapse
|
20
|
Domazet-Lošo T, Carvunis AR, Albà MM, Šestak MS, Bakaric R, Neme R, Tautz D. No Evidence for Phylostratigraphic Bias Impacting Inferences on Patterns of Gene Emergence and Evolution. Mol Biol Evol 2017; 34:843-856. [PMID: 28087778 PMCID: PMC5400388 DOI: 10.1093/molbev/msw284] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11–15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences.
Collapse
Affiliation(s)
- Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia.,Catholic University of Croatia, Zagreb, Croatia
| | | | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain.,Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | - Martin Sebastijan Šestak
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruder Boškovic Institute, Zagreb, Croatia
| | - Robert Bakaric
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruder Boškovic Institute, Zagreb, Croatia
| | - Rafik Neme
- Max-Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
21
|
Turetzek N, Khadjeh S, Schomburg C, Prpic NM. Rapid diversification of homothorax expression patterns after gene duplication in spiders. BMC Evol Biol 2017; 17:168. [PMID: 28709396 PMCID: PMC5513375 DOI: 10.1186/s12862-017-1013-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 07/04/2017] [Indexed: 01/09/2023] Open
Abstract
Background Gene duplications provide genetic material for the evolution of new morphological and physiological features. One copy can preserve the original gene functions while the second copy may evolve new functions (neofunctionalisation). Gene duplications may thus provide new genes involved in evolutionary novelties. Results We have studied the duplicated homeobox gene homothorax (hth) in the spider species Parasteatoda tepidariorum and Pholcus phalangioides and have compared these data with previously published data from additional spider species. We show that the expression pattern of hth1 is highly conserved among spiders, consistent with the notion that this gene copy preserves the original hth functions. By contrast, hth2 has a markedly different expression profile especially in the prosomal appendages. The pattern in the pedipalps and legs consists of several segmental rings, suggesting a possible role of hth2 in limb joint development. Intriguingly, however, the hth2 pattern is much less conserved between the species than hth1 and shows a species specific pattern in each species investigated so far. Conclusions We hypothesise that the hth2 gene has gained a new patterning function after gene duplication, but has then undergone a second phase of diversification of its new role in the spider clade. The evolution of hth2 may thus provide an interesting example for a duplicated gene that has not only contributed to genetic diversity through neofunctionalisation, but beyond that has been able to escape evolutionary conservation after neofunctionalisation thus forming the basis for further genetic diversification. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-1013-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Natascha Turetzek
- Abteilung für Entwicklungsbiologie, Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Georg-August-Universität, Göttingen, Germany.,Göttingen Center for Molecular Biosciences (GZMB), Ernst-Caspari-Haus, Göttingen, Germany.,Current address: Georg-August-Universität Göttingen, Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Abteilung Zelluläre Neurobiologie, 37077, Göttingen, Germany
| | - Sara Khadjeh
- Abteilung für Entwicklungsbiologie, Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Georg-August-Universität, Göttingen, Germany.,Göttingen Center for Molecular Biosciences (GZMB), Ernst-Caspari-Haus, Göttingen, Germany.,Present address: Clinic for Cardiology and Pneumology, University Medical Center Göttingen (UMG), Georg-August-University, Göttingen, Germany
| | - Christoph Schomburg
- Abteilung für Entwicklungsbiologie, Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Georg-August-Universität, Göttingen, Germany.,Göttingen Center for Molecular Biosciences (GZMB), Ernst-Caspari-Haus, Göttingen, Germany
| | - Nikola-Michael Prpic
- Abteilung für Entwicklungsbiologie, Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Georg-August-Universität, Göttingen, Germany. .,Göttingen Center for Molecular Biosciences (GZMB), Ernst-Caspari-Haus, Göttingen, Germany.
| |
Collapse
|
22
|
Catania F. From intronization to intron loss: How the interplay between mRNA-associated processes can shape the architecture and the expression of eukaryotic genes. Int J Biochem Cell Biol 2017; 91:136-144. [PMID: 28673893 DOI: 10.1016/j.biocel.2017.06.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 06/25/2017] [Accepted: 06/30/2017] [Indexed: 12/29/2022]
Abstract
Transcription-coupled processes such as capping, splicing, and cleavage/polyadenylation participate in the journey from genes to proteins. Although they are traditionally thought to serve only as steps in the generation of mature mRNAs, a synthesis of available data indicates that these processes could also act as a driving force for the evolution of eukaryotic genes. A theoretical framework for how mRNA-associated processes may shape gene structure and expression has recently been proposed. Factors that promote splicing and cleavage/polyadenylation in this framework compete for access to overlapping or neighboring signals throughout the transcription cycle. These antagonistic interactions allow mechanisms for intron gain and splice site recognition as well as common trends in eukaryotic gene structure and expression to be coherently integrated. Here, I extend this framework further. Observations that largely (but not exclusively) revolve around the formation of DNA-RNA hybrid structures, called R loops, and promoter directionality are integrated. Additionally, the interplay between splicing factors and cleavage/polyadenylation factors is theorized to also affect the formation of intragenic DNA double-stranded breaks thereby contributing to intron loss. The most notable prediction in this proposition is that RNA molecules can mediate intron loss by serving as a template to repair DNA double-stranded breaks. The framework presented here leverages a vast body of empirical observations, logically extending previous suggestions, and generating verifiable predictions to further substantiate the view that the intracellular environment plays an active role in shaping the structure and the expression of eukaryotic genes.
Collapse
Affiliation(s)
- Francesco Catania
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstraße 1, 48149 Münster, Germany.
| |
Collapse
|
23
|
Luis Villanueva-Cañas J, Ruiz-Orera J, Agea MI, Gallo M, Andreu D, Albà MM. New Genes and Functional Innovation in Mammals. Genome Biol Evol 2017; 9:1886-1900. [PMID: 28854603 PMCID: PMC5554394 DOI: 10.1093/gbe/evx136] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2017] [Indexed: 12/22/2022] Open
Abstract
The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions, we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mammalian species, obtaining ∼6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unknown, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from noncoding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes in mammals. Finally, we show that most young mammalian genes are preferentially expressed in testis, suggesting that sexual selection plays an important role in the emergence of new functional genes.
Collapse
Affiliation(s)
- José Luis Villanueva-Cañas
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
- Present address: Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Jorge Ruiz-Orera
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | - M. Isabel Agea
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | - Maria Gallo
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - David Andreu
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - M. Mar Albà
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
24
|
Basile W, Sachenkova O, Light S, Elofsson A. High GC content causes orphan proteins to be intrinsically disordered. PLoS Comput Biol 2017; 13:e1005375. [PMID: 28355220 PMCID: PMC5389847 DOI: 10.1371/journal.pcbi.1005375] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 04/12/2017] [Accepted: 01/21/2017] [Indexed: 01/29/2023] Open
Abstract
De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken. We show that the GC content of a genome is of great importance for the properties of an orphan protein. GC content affects the frequency of the codons and this affects the probability for each amino acid to be included in a de novo created protein. The codons encoding for Ala, Pro and Gly contain 80% GC, while codons for Lys, Phe, Asn, Tyr and Ile contain 20% or less. The three high GC amino acids are all disorder promoting, while Phe, Tyr and Ile are order promoting. Therefore, random protein sequences at a high GC will be more disordered than the ones created at a low GC. The structural properties of the youngest proteins match to a large degree the properties of random proteins when the GC content is taken into account. In contrast, structural properties of ancient proteins only show a weak correlation with GC content. This suggests that even after fixation in the population, proteins largely resemble random proteins given a certain GC content. Thereafter, during evolution the correlation between structural properties and GC weakens.
Collapse
Affiliation(s)
- Walter Basile
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Oxana Sachenkova
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Sara Light
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Bioinformatics Infrastructure for Life Sciences (BILS), Linköping University, Linköping, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Swedish e-Science Research Center (SeRC), Kungliga Tekniska Högskolan, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
25
|
Multi-step formation, evolution, and functionalization of new cytoplasmic male sterility genes in the plant mitochondrial genomes. Cell Res 2016; 27:130-146. [PMID: 27725674 DOI: 10.1038/cr.2016.115] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Revised: 08/04/2016] [Accepted: 09/01/2016] [Indexed: 01/28/2023] Open
Abstract
New gene origination is a major source of genomic innovations that confer phenotypic changes and biological diversity. Generation of new mitochondrial genes in plants may cause cytoplasmic male sterility (CMS), which can promote outcrossing and increase fitness. However, how mitochondrial genes originate and evolve in structure and function remains unclear. The rice Wild Abortive type of CMS is conferred by the mitochondrial gene WA352c (previously named WA352) and has been widely exploited in hybrid rice breeding. Here, we reconstruct the evolutionary trajectory of WA352c by the identification and analyses of 11 mitochondrial genomic recombinant structures related to WA352c in wild and cultivated rice. We deduce that these structures arose through multiple rearrangements among conserved mitochondrial sequences in the mitochondrial genome of the wild rice Oryza rufipogon, coupled with substoichiometric shifting and sequence variation. We identify two expressed but nonfunctional protogenes among these structures, and show that they could evolve into functional CMS genes via sequence variations that could relieve the self-inhibitory potential of the proteins. These sequence changes would endow the proteins the ability to interact with the nucleus-encoded mitochondrial protein COX11, resulting in premature programmed cell death in the anther tapetum and male sterility. Furthermore, we show that the sequences that encode the COX11-interaction domains in these WA352c-related genes have experienced purifying selection during evolution. We propose a model for the formation and evolution of new CMS genes via a "multi-recombination/protogene formation/functionalization" mechanism involving gradual variations in the structure, sequence, copy number, and function.
Collapse
|
26
|
Li ZW, Chen X, Wu Q, Hagmann J, Han TS, Zou YP, Ge S, Guo YL. On the Origin of De Novo Genes in Arabidopsis thaliana Populations. Genome Biol Evol 2016; 8:2190-202. [PMID: 27401176 PMCID: PMC4987118 DOI: 10.1093/gbe/evw164] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
De novo genes, which originate from ancestral nongenic sequences, are one of the most important sources of protein-coding genes. This origination process is crucial for the adaptation of organisms. However, how de novo genes arise and become fixed in a population or species remains largely unknown. Here, we identified 782 de novo genes from the model plant Arabidopsis thaliana and divided them into three types based on the availability of translational evidence, transcriptional evidence, and neither transcriptional nor translational evidence for their origin. Importantly, by integrating multiple types of omics data, including data from genomes, epigenomes, transcriptomes, and translatomes, we found that epigenetic modifications (DNA methylation and histone modification) play an important role in the origination process of de novo genes. Intriguingly, using the transcriptomes and methylomes from the same population of 84 accessions, we found that de novo genes that are transcribed in approximately half of the total accessions within the population are highly methylated, with lower levels of transcription than those transcribed at other frequencies within the population. We hypothesized that, during the origin of de novo gene alleles, those neutralized to low expression states via DNA methylation have relatively high probabilities of spreading and becoming fixed in a population. Our results highlight the process underlying the origin of de novo genes at the population level, as well as the importance of DNA methylation in this process.
Collapse
Affiliation(s)
- Zi-Wen Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Xi Chen
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Qiong Wu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Jörg Hagmann
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Ting-Shen Han
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Yu-Pan Zou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Song Ge
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
27
|
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet 2016; 17:567-78. [PMID: 27452112 DOI: 10.1038/nrg.2016.78] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.
Collapse
Affiliation(s)
- Aoife McLysaght
- The Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
| |
Collapse
|
28
|
McLysaght A, Guerzoni D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140332. [PMID: 26323763 PMCID: PMC4571571 DOI: 10.1098/rstb.2014.0332] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations.
Collapse
Affiliation(s)
- Aoife McLysaght
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| | - Daniele Guerzoni
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| |
Collapse
|
29
|
Emera D, Yin J, Reilly SK, Gockley J, Noonan JP. Origin and evolution of developmental enhancers in the mammalian neocortex. Proc Natl Acad Sci U S A 2016; 113:E2617-26. [PMID: 27114548 PMCID: PMC4868431 DOI: 10.1073/pnas.1603718113] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Morphological innovations such as the mammalian neocortex may involve the evolution of novel regulatory sequences. However, de novo birth of regulatory elements active during morphogenesis has not been extensively studied in mammals. Here, we use H3K27ac-defined regulatory elements active during human and mouse corticogenesis to identify enhancers that were likely active in the ancient mammalian forebrain. We infer the phylogenetic origins of these enhancers and find that ∼20% arose in the mammalian stem lineage, coincident with the emergence of the neocortex. Implementing a permutation strategy that controls for the nonrandom variation in the ages of background genomic sequences, we find that mammal-specific enhancers are overrepresented near genes involved in cell migration, cell signaling, and axon guidance. Mammal-specific enhancers are also overrepresented in modules of coexpressed genes in the cortex that are associated with these pathways, notably ephrin and semaphorin signaling. Our results also provide insight into the mechanisms of regulatory innovation in mammals. We find that most neocortical enhancers did not originate by en bloc exaptation of transposons. Young neocortical enhancers exhibit smaller H3K27ac footprints and weaker evolutionary constraint in eutherian mammals than older neocortical enhancers. Based on these observations, we present a model of the enhancer life cycle in which neocortical enhancers initially emerge from genomic background as short, weakly constrained "proto-enhancers." Many proto-enhancers are likely lost, but some may serve as nucleation points for complex enhancers to evolve.
Collapse
Affiliation(s)
- Deena Emera
- Department of Genetics, Yale School of Medicine, New Haven, CT
| | - Jun Yin
- Department of Genetics, Yale School of Medicine, New Haven, CT
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT
| | - Jake Gockley
- Department of Genetics, Yale School of Medicine, New Haven, CT
| | - James P Noonan
- Department of Genetics, Yale School of Medicine, New Haven, CT; Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511
| |
Collapse
|
30
|
Koonin EV. The meaning of biological information. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0065. [PMID: 26857678 PMCID: PMC4760125 DOI: 10.1098/rsta.2015.0065] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 07/27/2015] [Indexed: 06/05/2023]
Abstract
Biological information encoded in genomes is fundamentally different from and effectively orthogonal to Shannon entropy. The biologically relevant concept of information has to do with 'meaning', i.e. encoding various biological functions with various degree of evolutionary conservation. Apart from direct experimentation, the meaning, or biological information content, can be extracted and quantified from alignments of homologous nucleotide or amino acid sequences but generally not from a single sequence, using appropriately modified information theoretical formulae. For short, information encoded in genomes is defined vertically but not horizontally. Informally but substantially, biological information density seems to be equivalent to 'meaning' of genomic sequences that spans the entire range from sharply defined, universal meaning to effective meaninglessness. Large fractions of genomes, up to 90% in some plants, belong within the domain of fuzzy meaning. The sequences with fuzzy meaning can be recruited for various functions, with the meaning subsequently fixed, and also could perform generic functional roles that do not require sequence conservation. Biological meaning is continuously transferred between the genomes of selfish elements and hosts in the process of their coevolution. Thus, in order to adequately describe genome function and evolution, the concepts of information theory have to be adapted to incorporate the notion of meaning that is central to biology.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
31
|
Neme R, Tautz D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 2016; 5:e09977. [PMID: 26836309 PMCID: PMC4829534 DOI: 10.7554/elife.09977] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 02/01/2016] [Indexed: 01/17/2023] Open
Abstract
Deep sequencing analyses have shown that a large fraction of genomes is transcribed, but the significance of this transcription is much debated. Here, we characterize the phylogenetic turnover of poly-adenylated transcripts in a comprehensive sampling of taxa of the mouse (genus Mus), spanning a phylogenetic distance of 10 Myr. Using deep RNA sequencing we find that at a given sequencing depth transcriptome coverage becomes saturated within a taxon, but keeps extending when compared between taxa, even at this very shallow phylogenetic level. Our data show a high turnover of transcriptional states between taxa and that no major transcript-free islands exist across evolutionary time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. We conclude that any part of the non-coding genome can potentially become subject to evolutionary functionalization via de novo gene evolution within relatively short evolutionary time spans. DOI:http://dx.doi.org/10.7554/eLife.09977.001 Traditionally, the genome – the sum total of DNA within a cell – was thought to be divided into genes and ‘non-coding’ regions. Genes are copied, or “transcribed”, into molecules called RNA that perform essential tasks in the cell. The roles of the non-coding regions were often less clear, although it has since become apparent that some are also transcribed and generate low levels of RNA molecules. However, many debate how significant this transcription is to living organisms. Neme and Tautz have now used a technique called deep RNA sequencing to study the RNA molecules produced in several different species and types of mice whose last common ancestor lived 10 million years ago. Different species produced RNA molecules from different portions – both genes and non-coding regions – of their genomes. Comparing these RNA sequences suggests that changes to the regions that are transcribed occur relatively quickly for a large portion of the genome. Furthermore, there have been no significant areas of the common ancestor’s genome that have not been transcribed at some point in at least one of its descendent species. This therefore suggests that over a relatively short evolutionary period, any part of the genome can acquire the ability to be transcribed and potentially form a new gene. The next challenge is to find out how often these transcribed non-coding parts of the genome show important biochemical activities, and how they find their way into becoming new genes. DOI:http://dx.doi.org/10.7554/eLife.09977.002
Collapse
Affiliation(s)
- Rafik Neme
- Max-Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
32
|
Ruiz-Orera J, Hernandez-Rodriguez J, Chiva C, Sabidó E, Kondova I, Bontrop R, Marqués-Bonet T, Albà M. Origins of De Novo Genes in Human and Chimpanzee. PLoS Genet 2015; 11:e1005721. [PMID: 26720152 PMCID: PMC4697840 DOI: 10.1371/journal.pgen.1005721] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 11/11/2015] [Indexed: 11/18/2022] Open
Abstract
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Evolutionary Genomics Group, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | | | - Cristina Chiva
- Proteomics Unit, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Proteomics Unit, Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Eduard Sabidó
- Proteomics Unit, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Proteomics Unit, Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Ivanela Kondova
- Biomedical Primate Research Center (BPRC), Rijswijk, The Netherlands
| | - Ronald Bontrop
- Biomedical Primate Research Center (BPRC), Rijswijk, The Netherlands
| | - Tomàs Marqués-Bonet
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - M.Mar Albà
- Evolutionary Genomics Group, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- * E-mail:
| |
Collapse
|
33
|
Laan L, Koschwanez JH, Murray AW. Evolutionary adaptation after crippling cell polarization follows reproducible trajectories. eLife 2015; 4. [PMID: 26426479 PMCID: PMC4630673 DOI: 10.7554/elife.09638] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 09/30/2015] [Indexed: 12/21/2022] Open
Abstract
Cells are organized by functional modules, which typically contain components whose removal severely compromises the module's function. Despite their importance, these components are not absolutely conserved between parts of the tree of life, suggesting that cells can evolve to perform the same biological functions with different proteins. We evolved Saccharomyces cerevisiae for 1000 generations without the important polarity gene BEM1. Initially the bem1∆ lineages rapidly increase in fitness and then slowly reach >90% of the fitness of their BEM1 ancestors at the end of the evolution. Sequencing their genomes and monitoring polarization reveals a common evolutionary trajectory, with a fixed sequence of adaptive mutations, each improving cell polarization by inactivating proteins. Our results show that organisms can be evolutionarily robust to physiologically destructive perturbations and suggest that recovery by gene inactivation can lead to rapid divergence in the parts list for cell biologically important functions. DOI:http://dx.doi.org/10.7554/eLife.09638.001 Cells use the genetic instructions provided by genes in particular combinations called ‘modules’ to perform particular jobs. Very different organisms can share many of the same modules because certain abilities are fundamental to the survival of all cells and so they have been retained over the course of evolution. That said, these modules may not necessarily involve the same genes because it is often possible to achieve the same result using different components. One way to study how those modules can diversify is to deliberately disrupt one of the genes in a module, and observe how the organism and its descendants respond over many generations. Other genes in these organisms may acquire genetic mutations that enable the genes to take on the role of the missing protein. However, the removal of a single component can be detrimental to the survival of the organisms or may affect many different processes. This can make it difficult to understand what is going on. A gene called BEM1 is crucial for yeast cells to establish polarity, that is, to allow the different sides of a cell to become distinct from one another. This activity is essential for the yeast to replicate itself. Previous studies have shown that the BEM1 gene had a different role in other species of fungi, which suggests that yeast may have other genes that previously assumed the role that BEM1 does now. In this study, Laan et al. removed BEM1 from yeast and allowed the population of mutant cells to evolve for a thousand generations. The approach differs from previous studies because Laan et al. deliberately selected for yeast that had acquired multiple genetic mutations that can together almost fully compensate for the loss of BEM1. Initially, the mutant cells grew very slowly, were abnormal in shape and likely to burst open. However, by the end of the experiment, the cells were able to grow almost as well as the original yeast cells had before the gene deletion. Genetic analysis revealed that the deletion of BEM1 triggers the inactivation of other genes that are also involved in the regulation of polarity, which largely restored the ability of the disrupted polarity module to work. This restoration follows a ‘reproducible trajectory’, as the same genes were switched off in the same order in different populations of yeast that were studied at the same time. The work is an example of reproducible evolution, whereby a specific order of changes to gene activity repeatedly enables cells with severe defects in important processes to adapt and restore a gene module, using whatever components they have left. The next challenge will be to understand how the particular roles of important modules affect their adaptability. DOI:http://dx.doi.org/10.7554/eLife.09638.002
Collapse
Affiliation(s)
- Liedewij Laan
- FAS Center for Systems Biology, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - John H Koschwanez
- FAS Center for Systems Biology, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| | - Andrew W Murray
- FAS Center for Systems Biology, Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States
| |
Collapse
|
34
|
Chen JY, Shen QS, Zhou WZ, Peng J, He BZ, Li Y, Liu CJ, Luan X, Ding W, Li S, Chen C, Tan BCM, Zhang YE, He A, Li CY. Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates. PLoS Genet 2015; 11:e1005391. [PMID: 26177073 PMCID: PMC4503675 DOI: 10.1371/journal.pgen.1005391] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/24/2015] [Indexed: 01/08/2023] Open
Abstract
While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts. Although gene duplication has been believed as a predominant mechanism for creating new genes, recent reports suggested that new proteins could evolve “de novo” from non-coding DNA regions. These de novo genes are also named as “motherless” genes due to their lack of ancestral proteins as precursors, while recently we and others found that lncRNAs may represent an intermediate stage of their origination. To further elucidate this lncRNA-protein transition process, here we identified 64 hominoid-specific de novo genes and report a new mechanism for the origination of functional de novo proteins from ancestral non-coding transcripts: These non-coding “precursors” are generally not more selectively constrained than other lncRNA loci; and the existence of these de novo proteins is not beyond anticipation under neutral expectation; however, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution.
Collapse
Affiliation(s)
- Jia-Yu Chen
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Qing Sunny Shen
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, China
| | - Jiguang Peng
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Bin Z. He
- FAS Center for Systems Biology & Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts, United States of America
| | - Yumei Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chu-Jun Liu
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Xuke Luan
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Beijing, China
| | - Wanqiu Ding
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Shuxian Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chunyan Chen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | | | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Aibin He
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Beijing, China
- * E-mail: (AH); (CYL)
| | - Chuan-Yun Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- * E-mail: (AH); (CYL)
| |
Collapse
|
35
|
Pezer Ž, Harr B, Teschke M, Babiker H, Tautz D. Divergence patterns of genic copy number variation in natural populations of the house mouse (Mus musculus domesticus) reveal three conserved genes with major population-specific expansions. Genome Res 2015; 25:1114-24. [PMID: 26149421 PMCID: PMC4509996 DOI: 10.1101/gr.187187.114] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 06/05/2015] [Indexed: 11/29/2022]
Abstract
Copy number variation represents a major source of genetic divergence, yet the evolutionary dynamics of genic copy number variation in natural populations during differentiation and adaptation remain unclear. We applied a read depth approach to genome resequencing data to detect copy number variants (CNVs) ≥1 kb in wild-caught mice belonging to four populations of Mus musculus domesticus. We complemented the bioinformatics analyses with experimental validation using droplet digital PCR. The specific focus of our analysis is CNVs that include complete genes, as these CNVs could be expected to contribute most directly to evolutionary divergence. In total, 1863 transcription units appear to be completely encompassed within CNVs in at least one individual when compared to the reference assembly. Further, 179 of these CNVs show population-specific copy number differences, and 325 are subject to complete deletion in multiple individuals. Among the most copy-number variable genes are three highly conserved genes that encode the splicing factor CWC22, the spindle protein SFI1, and the Holliday junction recognition protein HJURP. These genes exhibit population-specific expansion patterns that suggest involvement in local adaptations. We found that genes that overlap with large segmental duplications are generally more copy-number variable. These genes encode proteins that are relevant for environmental and behavioral interactions, such as vomeronasal and olfactory receptors, as well as major urinary proteins and several proteins of unknown function. The overall analysis shows that genic CNVs contribute more to population differentiation in mice than in humans and may promote and speed up population divergence.
Collapse
Affiliation(s)
- Željka Pezer
- Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Bettina Harr
- Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Meike Teschke
- Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Hiba Babiker
- Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| |
Collapse
|
36
|
De La Torre AR, Lin YC, Van de Peer Y, Ingvarsson PK. Genome-wide analysis reveals diverged patterns of codon bias, gene expression, and rates of sequence evolution in picea gene families. Genome Biol Evol 2015; 7:1002-15. [PMID: 25747252 PMCID: PMC4419791 DOI: 10.1093/gbe/evv044] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The recent sequencing of several gymnosperm genomes has greatly facilitated studying the evolution of their genes and gene families. In this study, we examine the evidence for expression-mediated selection in the first two fully sequenced representatives of the gymnosperm plant clade (Picea abies and Picea glauca). We use genome-wide estimates of gene expression (>50,000 expressed genes) to study the relationship between gene expression, codon bias, rates of sequence divergence, protein length, and gene duplication. We found that gene expression is correlated with rates of sequence divergence and codon bias, suggesting that natural selection is acting on Picea protein-coding genes for translational efficiency. Gene expression, rates of sequence divergence, and codon bias are correlated with the size of gene families, with large multicopy gene families having, on average, a lower expression level and breadth, lower codon bias, and higher rates of sequence divergence than single-copy gene families. Tissue-specific patterns of gene expression were more common in large gene families with large gene expression divergence than in single-copy families. Recent family expansions combined with large gene expression variation in paralogs and increased rates of sequence evolution suggest that some Picea gene families are rapidly evolving to cope with biotic and abiotic stress. Our study highlights the importance of gene expression and natural selection in shaping the evolution of protein-coding genes in Picea species, and sets the ground for further studies investigating the evolution of individual gene families in gymnosperms.
Collapse
Affiliation(s)
| | - Yao-Cheng Lin
- Department of Plant Systems Biology, VIB, and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium Genomics Research Institute, University of Pretoria, South Africa
| | - Pär K Ingvarsson
- Department of Ecology and Environmental Science, Umeå University, Sweden Umeå Plant Science Centre, Umeå, Sweden
| |
Collapse
|
37
|
Bitard-Feildel T, Heberlein M, Bornberg-Bauer E, Callebaut I. Detection of orphan domains in Drosophila using "hydrophobic cluster analysis". Biochimie 2015; 119:244-53. [PMID: 25736992 DOI: 10.1016/j.biochi.2015.02.019] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 02/20/2015] [Indexed: 11/30/2022]
Abstract
INTRODUCTION Comparative genomics has become an important strategy in life science research. While many genes, and the proteins they code for, can be well characterized by assigning orthologs, a significant amount of proteins or domains remain obscure "orphans". Some orphans are overlooked by current computational methods because they rapidly diverged, others emerged relatively recently (de novo). Recent research has demonstrated the importance of orphans, and of de novo proteins and domains for development of new phenotypic traits and adaptation. New approaches for detecting novel domains are thus of paramount importance. RESULTS The hydrophobic cluster analysis (HCA) method delineates globular-like domains from the information of a protein sequence and thereby allows bypassing some of the established methods limitations based on conserved sequence similarity. In this study, HCA is tested for orphan domain detection on 12 Drosophila genomes. After their detection, the oprhan domains are classified into two categories, depending on their presence/absence in distantly related species. The two categories show significantly different physico-chemical properties when compared to previously characterized domains from the Pfam database. The newly detected domains have a higher degree of intrinsic disorder and a particular hydrophobic cluster composition. The older the domains are, the more similar their hydrophobic cluster content is to the cluster content of Pfam domains. The results suggest that, over time, newly created domains acquire a canonical set of hydrophobic clusters but conserve some features of intrinsically disordered regions. CONCLUSION Our results agree with previous findings on orphan domains and suggest that the physico-chemical properties of domains change over evolutionary long time scale. The presented HCA-based method is able to detect domains with unusual properties without relying on prior knowledge, such as the availability of homologs. Therefore, the method has large potential for complementing existing strategies to annotate genomes, and for better understanding how molecular features emerge.
Collapse
Affiliation(s)
- Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany
| | - Magdalena Heberlein
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany.
| | - Isabelle Callebaut
- IMPMC, Sorbonne Universités - UMR CNRS 7590, UPMC Univ Paris 06, Muséum National d'Histoire Naturelle, IRD UMR 206, IUC 4 Place Jussieu, F-75005 Paris, France.
| |
Collapse
|
38
|
Karn RC, Chung AG, Laukaitis CM. Did androgen-binding protein paralogs undergo neo- and/or Subfunctionalization as the Abp gene region expanded in the mouse genome? PLoS One 2014; 9:e115454. [PMID: 25531410 PMCID: PMC4274081 DOI: 10.1371/journal.pone.0115454] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 11/24/2014] [Indexed: 11/19/2022] Open
Abstract
The Androgen-binding protein (Abp) region of the mouse genome contains 30 Abpa genes encoding alpha subunits and 34 Abpbg genes encoding betagamma subunits, their products forming dimers composed of an alpha and a betagamma subunit. We endeavored to determine how many Abp genes are expressed as proteins in tears and saliva, and as transcripts in the exocrine glands producing them. Using standard PCR, we amplified Abp transcripts from cDNA libraries of C57BL/6 mice and found fifteen Abp gene transcripts in the lacrimal gland and five in the submandibular gland. Proteomic analyses identified proteins corresponding to eleven of the lacrimal gland transcripts, all of them different from the three salivary ABPs reported previously. Our qPCR results showed that five of the six transcripts that lacked corresponding proteins are expressed at very low levels compared to those transcripts with proteins. We found 1) no overlap in the repertoires of expressed Abp paralogs in lacrimal gland/tears and salivary glands/saliva; 2) substantial sex-limited expression of lacrimal gland/tear expressed-paralogs in males but no sex-limited expression in females; and 3) that the lacrimal gland/tear expressed-paralogs are found exclusively in ancestral clades 1, 2 and 3 of the five clades described previously while the salivary glands/saliva expressed-paralogs are found only in clade 5. The number of instances of extremely low levels of transcription without corresponding protein production in paralogs specific to tears and saliva suggested the role of subfunctionalization, a derived condition wherein genes that may have been expressed highly in both glands ancestrally were down-regulated subsequent to duplication. Thus, evidence for subfunctionalization can be seen in our data and we argue that the partitioning of paralog expression between lacrimal and salivary glands that we report here occurred as the result of adaptive evolution.
Collapse
Affiliation(s)
- Robert C. Karn
- College of Medicine, University of Arizona, Tucson, Arizona, 85724, United States of America
- * E-mail:
| | - Amanda G. Chung
- College of Medicine, University of Arizona, Tucson, Arizona, 85724, United States of America
| | - Christina M. Laukaitis
- College of Medicine, University of Arizona, Tucson, Arizona, 85724, United States of America
| |
Collapse
|
39
|
Bosch TC. Rethinking the role of immunity: lessons from Hydra. Trends Immunol 2014; 35:495-502. [DOI: 10.1016/j.it.2014.07.008] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Revised: 07/28/2014] [Accepted: 07/29/2014] [Indexed: 12/24/2022]
|
40
|
Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM. Long non-coding RNAs as a source of new peptides. eLife 2014; 3:e03523. [PMID: 25233276 PMCID: PMC4359382 DOI: 10.7554/elife.03523] [Citation(s) in RCA: 366] [Impact Index Per Article: 36.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 08/11/2014] [Indexed: 12/11/2022] Open
Abstract
Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution. DOI:http://dx.doi.org/10.7554/eLife.03523.001 Despite the terms being largely interchangeable in modern language, ‘DNA’ and ‘gene’ do not mean the same thing. A gene is made of DNA and contains the instructions to make a protein, and it is the protein that performs the function of the gene. However, cells in the body also contain DNA that does not form genes. Far from being ‘junk’ DNA with no biological purpose; this DNA has a variety of roles, including affecting how other genes are used. To produce a protein, the DNA sequence of a gene is transcribed into an intermediate molecule called RNA, which is then translated to produce a protein. So-called long non-coding RNA (lncRNA) molecules are also transcribed from DNA, but whether these are translated to make proteins has been a subject of much debate. Indeed, the function of the vast majority of lncRNA molecules is unknown. Ruiz-Orera et al. analyzed RNA sequences collected from earlier experiments on six different species—humans, mice, fish, flies, yeast, and a plant—and found nearly 2500 as yet unstudied lncRNAs in addition to those previously identified. Many of the lncRNAs that Ruiz-Orera et al. investigated could be found lodged inside the cellular machinery used to translate RNA into proteins. Furthermore, these lncRNA molecules are oriented in the machinery as if they are primed and ready for translation, suggesting that many lncRNAs do produce proteins. However, it is unclear how many of these proteins have a useful function. Very few lncRNAs were found in more than one species, suggesting that they have evolved recently. The properties of lncRNA molecules also show many similarities with the properties of ‘young’—recently evolved—genes that are known to produce proteins. The combined findings of Ruiz-Orera et al. therefore suggest that lncRNAs are important for developing new proteins. The emergence of proteins with new functions has been an important driving force in evolution, and this work provides important clues into the first steps of this process. DOI:http://dx.doi.org/10.7554/eLife.03523.002
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| | - Xavier Messeguer
- Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Spain
| | - Juan Antonio Subirana
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| | - M Mar Alba
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|