1
|
Lee U, Mozeika SM, Zhao L. A Synergistic, Cultivator Model of De Novo Gene Origination. Genome Biol Evol 2024; 16:evae103. [PMID: 38748819 PMCID: PMC11152449 DOI: 10.1093/gbe/evae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 06/07/2024] Open
Abstract
The origin and fixation of evolutionarily young genes is a fundamental question in evolutionary biology. However, understanding the origins of newly evolved genes arising de novo from noncoding genomic sequences is challenging. This is partly due to the low likelihood that several neutral or nearly neutral mutations fix prior to the appearance of an important novel molecular function. This issue is particularly exacerbated in large effective population sizes where the effect of drift is small. To address this problem, we propose a regulation-focused, cultivator model for de novo gene evolution. This cultivator-focused model posits that each step in a novel variant's evolutionary trajectory is driven by well-defined, selectively advantageous functions for the cultivator genes, rather than solely by the de novo genes, emphasizing the critical role of genome organization in the evolution of new genes.
Collapse
Affiliation(s)
- UnJin Lee
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Shawn M Mozeika
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| |
Collapse
|
2
|
Middendorf L, Eicholt LA. Random, de novo, and conserved proteins: How structure and disorder predictors perform differently. Proteins 2024; 92:757-767. [PMID: 38226524 DOI: 10.1002/prot.26652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/18/2023] [Accepted: 12/01/2023] [Indexed: 01/17/2024]
Abstract
Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| |
Collapse
|
3
|
Linnenbrink M, Breton G, Misra P, Pfeifle C, Dutheil JY, Tautz D. Experimental Evaluation of a Direct Fitness Effect of the De Novo Evolved Mouse Gene Pldi. Genome Biol Evol 2024; 16:evae084. [PMID: 38742287 PMCID: PMC11091481 DOI: 10.1093/gbe/evae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/16/2024] [Indexed: 05/16/2024] Open
Abstract
De novo evolved genes emerge from random parts of noncoding sequences and have, therefore, no homologs from which a function could be inferred. While expression analysis and knockout experiments can provide insights into the function, they do not directly test whether the gene is beneficial for its carrier. Here, we have used a seminatural environment experiment to test the fitness of the previously identified de novo evolved mouse gene Pldi, which has been implicated to have a role in sperm differentiation. We used a knockout mouse strain for this gene and competed it against its parental wildtype strain for several generations of free reproduction. We found that the knockout (ko) allele frequency decreased consistently across three replicates of the experiment. Using an approximate Bayesian computation framework that simulated the data under a demographic scenario mimicking the experiment's demography, we could estimate a selection coefficient ranging between 0.21 and 0.61 for the wildtype allele compared to the ko allele in males, under various models. This implies a relatively strong selective advantage, which would fix the new gene in less than hundred generations after its emergence.
Collapse
Affiliation(s)
- Miriam Linnenbrink
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Max Planck Institute for Biological Intelligence, 82152 Martinsried, Germany
| | - Gwenna Breton
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Clinical Genomics Gothenburg, Science for Life Laboratory, Sahlgrenska Academy, University of Gothenburg, and Center for Medical Genomics, Department of Clinical Genetic and Genomics, Sahlgrenska University Hospital, Sweden
| | - Pallavi Misra
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Laboratory Corporation of America (LabCorp), Westborough, MA 01581, USA
| | - Christine Pfeifle
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Julien Y Dutheil
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| |
Collapse
|
4
|
Takeuchi N, Fullmer MS, Maddock DJ, Poole AM. The Constructive Black Queen hypothesis: new functions can evolve under conditions favouring gene loss. THE ISME JOURNAL 2024; 18:wrae011. [PMID: 38366199 PMCID: PMC10942775 DOI: 10.1093/ismejo/wrae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 02/18/2024]
Abstract
Duplication is a major route for the emergence of new gene functions. However, the emergence of new gene functions via this route may be reduced in prokaryotes, as redundant genes are often rapidly purged. In lineages with compact, streamlined genomes, it thus appears challenging for novel function to emerge via duplication and divergence. A further pressure contributing to gene loss occurs under Black Queen dynamics, as cheaters that lose the capacity to produce a public good can instead acquire it from neighbouring producers. We propose that Black Queen dynamics can favour the emergence of new function because, under an emerging Black Queen dynamic, there is high gene redundancy spread across a community of interacting cells. Using computational modelling, we demonstrate that new gene functions can emerge under Black Queen dynamics. This result holds even if there is deletion bias due to low duplication rates and selection against redundant gene copies resulting from the high cost associated with carrying a locus. However, when the public good production costs are high, Black Queen dynamics impede the fixation of new functions. Our results expand the mechanisms by which new gene functions can emerge in prokaryotic systems.
Collapse
Affiliation(s)
- Nobuto Takeuchi
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
- Universal Biology Institute, University of Tokyo, Tokyo 113-0033, Japan
- Department of Biology, Faculty of Sciences, Kyushu University, Fukuoka 819-0395, Japan
| | - Matthew S Fullmer
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Danielle J Maddock
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Anthony M Poole
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
5
|
Wesp V, Theißen G, Schuster S. Statistical analysis of synonymous and stop codons in pseudo-random and real sequences as a function of GC content. Sci Rep 2023; 13:22996. [PMID: 38151539 PMCID: PMC10752896 DOI: 10.1038/s41598-023-49626-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 12/10/2023] [Indexed: 12/29/2023] Open
Abstract
Knowledge of the frequencies of synonymous triplets in protein-coding and non-coding DNA stretches can be used in gene finding. These frequencies depend on the GC content of the genome or parts of it. An example of interest is provided by stop codons. This is relevant for the definition of Open Reading Frames. A generic case is provided by pseudo-random sequences, especially when they code for complex proteins or when they are non-coding and not subject to selection pressure. Here, we calculate, for such sequences and for all 25 known genetic codes, the frequency of each amino acid and stop codon based on their set of codons and as a function of GC content. The amino acids can be classified into five groups according to the GC content where their expected frequency reaches its maximum. We determine the overall Shannon information based on groups of synonymous codons and show that it becomes maximum at a percent GC of 43.3% (for the standard code). This is in line with the observation that in most fungi, plants, and animals, this genomic parameter is in the range from 35 to 50%. By analysing natural sequences, we show that there is a clear bias for triplets corresponding to stop codons near the 5'- and 3'-splice sites in the introns of various clades.
Collapse
Affiliation(s)
- Valentin Wesp
- Department of Bioinformatics, Matthias Schleiden Institute, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
| | - Günter Theißen
- Department of Genetics, Matthias Schleiden Institute, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany
| | - Stefan Schuster
- Department of Bioinformatics, Matthias Schleiden Institute, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany.
| |
Collapse
|
6
|
Frumkin I, Laub MT. Selection of a de novo gene that can promote survival of Escherichia coli by modulating protein homeostasis pathways. Nat Ecol Evol 2023; 7:2067-2079. [PMID: 37945946 PMCID: PMC10697842 DOI: 10.1038/s41559-023-02224-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 09/12/2023] [Indexed: 11/12/2023]
Abstract
Cellular novelty can emerge when non-functional loci become functional genes in a process termed de novo gene birth. But how proteins with random amino acid sequences beneficially integrate into existing cellular pathways remains poorly understood. We screened ~108 genes, generated from random nucleotide sequences and devoid of homology to natural genes, for their ability to rescue growth arrest of Escherichia coli cells producing the ribonuclease toxin MazF. We identified ~2,000 genes that could promote growth, probably by reducing transcription from the promoter driving toxin expression. Additionally, one random protein, named Random antitoxin of MazF (RamF), modulated protein homeostasis by interacting with chaperones, leading to MazF proteolysis and a consequent loss of its toxicity. Finally, we demonstrate that random proteins can improve during evolution by identifying beneficial mutations that turned RamF into a more efficient inhibitor. Our work provides a mechanistic basis for how de novo gene birth can produce functional proteins that effectively benefit cells evolving under stress.
Collapse
Affiliation(s)
- Idan Frumkin
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Cambridge, MA, USA.
| |
Collapse
|
7
|
Heames B, Buchel F, Aubel M, Tretyachenko V, Loginov D, Novák P, Lange A, Bornberg-Bauer E, Hlouchová K. Experimental characterization of de novo proteins and their unevolved random-sequence counterparts. Nat Ecol Evol 2023; 7:570-580. [PMID: 37024625 PMCID: PMC10089919 DOI: 10.1038/s41559-023-02010-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/10/2023] [Indexed: 04/08/2023]
Abstract
De novo gene emergence provides a route for new proteins to be formed from previously non-coding DNA. Proteins born in this way are considered random sequences and typically assumed to lack defined structure. While it remains unclear how likely a de novo protein is to assume a soluble and stable tertiary structure, intersecting evidence from random sequence and de novo-designed proteins suggests that native-like biophysical properties are abundant in sequence space. Taking putative de novo proteins identified in human and fly, we experimentally characterize a library of these sequences to assess their solubility and structure propensity. We compare this library to a set of synthetic random proteins with no evolutionary history. Bioinformatic prediction suggests that de novo proteins may have remarkably similar distributions of biophysical properties to unevolved random sequences of a given length and amino acid composition. However, upon expression in vitro, de novo proteins exhibit moderately higher solubility which is further induced by the DnaK chaperone system. We suggest that while synthetic random sequences are a useful proxy for de novo proteins in terms of structure propensity, de novo proteins may be better integrated in the cellular system than random expectation, given their higher solubility.
Collapse
Affiliation(s)
- Brennen Heames
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Filip Buchel
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic
- Department of Biochemistry, Charles University, Prague, Czech Republic
| | - Margaux Aubel
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | | | - Dmitry Loginov
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Petr Novák
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Andreas Lange
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
- Department of Protein Evolution, MPI for Developmental Biology, Tübingen, Germany.
| | - Klára Hlouchová
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic.
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
8
|
Sandmann CL, Schulz JF, Ruiz-Orera J, Kirchner M, Ziehm M, Adami E, Marczenke M, Christ A, Liebe N, Greiner J, Schoenenberger A, Muecke MB, Liang N, Moritz RL, Sun Z, Deutsch EW, Gotthardt M, Mudge JM, Prensner JR, Willnow TE, Mertins P, van Heesch S, Hubner N. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol Cell 2023; 83:994-1011.e18. [PMID: 36806354 PMCID: PMC10032668 DOI: 10.1016/j.molcel.2023.01.023] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/12/2022] [Accepted: 01/25/2023] [Indexed: 02/19/2023]
Abstract
All species continuously evolve short open reading frames (sORFs) that can be templated for protein synthesis and may provide raw materials for evolutionary adaptation. We analyzed the evolutionary origins of 7,264 recently cataloged human sORFs and found that most were evolutionarily young and had emerged de novo. We additionally identified 221 previously missed sORFs potentially translated into peptides of up to 15 amino acids-all of which are smaller than the smallest human microprotein annotated to date. To investigate the bioactivity of sORF-encoded small peptides and young microproteins, we subjected 266 candidates to a mass-spectrometry-based interactome screen with motif resolution. Based on these interactomes and additional cellular assays, we can associate several candidates with mRNA splicing, translational regulation, and endocytosis. Our work provides insights into the evolutionary origins and interaction potential of young and small proteins, thereby helping to elucidate this underexplored territory of the human proteome.
Collapse
Affiliation(s)
- Clara-L Sandmann
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jana F Schulz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jorge Ruiz-Orera
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Marieluise Kirchner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Matthias Ziehm
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Eleonora Adami
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Maike Marczenke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Annabel Christ
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Nina Liebe
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Johannes Greiner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Aaron Schoenenberger
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michael B Muecke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Ning Liang
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Michael Gotthardt
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John R Prensner
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Pediatric Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Thomas E Willnow
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark
| | - Philipp Mertins
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | | | - Norbert Hubner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany.
| |
Collapse
|
9
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
10
|
Random and Natural Non-Coding RNA Have Similar Structural Motif Patterns but Differ in Bulge, Loop, and Bond Counts. Life (Basel) 2023; 13:life13030708. [PMID: 36983865 PMCID: PMC10054693 DOI: 10.3390/life13030708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 02/15/2023] [Accepted: 02/27/2023] [Indexed: 03/08/2023] Open
Abstract
An important question in evolutionary biology is whether (and in what ways) genotype–phenotype (GP) map biases can influence evolutionary trajectories. Untangling the relative roles of natural selection and biases (and other factors) in shaping phenotypes can be difficult. Because the RNA secondary structure (SS) can be analyzed in detail mathematically and computationally, is biologically relevant, and a wealth of bioinformatic data are available, it offers a good model system for studying the role of bias. For quite short RNA (length L≤126), it has recently been shown that natural and random RNA types are structurally very similar, suggesting that bias strongly constrains evolutionary dynamics. Here, we extend these results with emphasis on much larger RNA with lengths up to 3000 nucleotides. By examining both abstract shapes and structural motif frequencies (i.e., the number of helices, bonds, bulges, junctions, and loops), we find that large natural and random structures are also very similar, especially when contrasted to typical structures sampled from the spaces of all possible RNA structures. Our motif frequency study yields another result, where the frequencies of different motifs can be used in machine learning algorithms to classify random and natural RNA with high accuracy, especially for longer RNA (e.g., ROC AUC 0.86 for L = 1000). The most important motifs for classification are the number of bulges, loops, and bonds. This finding may be useful in using SS to detect candidates for functional RNA within ‘junk’ DNA regions.
Collapse
|
11
|
Karlowski WM, Varshney D, Zielezinski A. Taxonomically Restricted Genes in Bacillus may Form Clusters of Homologs and Can be Traced to a Large Reservoir of Noncoding Sequences. Genome Biol Evol 2023; 15:7039703. [PMID: 36790099 PMCID: PMC10003748 DOI: 10.1093/gbe/evad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/09/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Taxonomically restricted genes (TRGs) are unique for a defined group of organisms and may act as potential genetic determinants of lineage-specific, biological properties. Here, we explore the TRGs of highly diverse and economically important Bacillus bacteria by examining commonly used TRG identification parameters and data sources. We show the significant effects of sequence similarity thresholds, composition, and the size of the reference database in the identification process. Subsequently, we applied stringent TRG search parameters and expanded the identification procedure by incorporating an analysis of noncoding and non-syntenic regions of non-Bacillus genomes. A multiplex annotation procedure minimized the number of false-positive TRG predictions and showed nearly one-third of the alleged TRGs could be mapped to genes missed in genome annotations. We traced the putative origin of TRGs by identifying homologous, noncoding genomic regions in non-Bacillus species and detected sequence changes that could transform these regions into protein-coding genes. In addition, our analysis indicated that Bacillus TRGs represent a specific group of genes mostly showing intermediate sequence properties between genes that are conserved across multiple taxa and nonannotated peptides encoded by open reading frames.
Collapse
Affiliation(s)
- Wojciech M Karlowski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Deepti Varshney
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Andrzej Zielezinski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| |
Collapse
|
12
|
Çakır U, Gabed N, Brunet M, Roucou X, Kryvoruchko I. Mosaic translation hypothesis: chimeric polypeptides produced via multiple ribosomal frameshifting as a basis for adaptability. FEBS J 2023; 290:370-378. [PMID: 34743413 DOI: 10.1111/febs.16269] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 10/03/2021] [Accepted: 11/05/2021] [Indexed: 02/05/2023]
Abstract
How many different proteins can be produced from a single spliced transcript? Genome annotation projects overlook the coding potential of reading frames other than that of the reference open reading frames (refORFs). Recently, alternative open reading frames (altORFs) and their translational products, alternative proteins, have been shown to carry out important functions in various organisms. AltORFs overlapping refORFs or other altORFs in a different reading frame may be involved in one fundamental mechanism so far overlooked. A few years ago, it was proposed that altORFs may act as building blocks for chimeric (mosaic) polypeptides, which are produced via multiple ribosomal frameshifting events from a single mature transcript. We adopt terminology from that earlier discussion and call this mechanism mosaic translation. This way of extracting and combining genetic information may significantly increase proteome diversity. Thus, we hypothesize that this mechanism may have contributed to the flexibility and adaptability of organisms to a variety of environmental conditions. Specialized ribosomes acting as sensors probably played a central role in this process. Importantly, mosaic translation may be the main source of protein diversity in genomes that lack alternative splicing. The idea of mosaic translation is a testable hypothesis, although its direct demonstration is challenging. Should mosaic translation occur, we would currently highly underestimate the complexity of translation mechanisms and thus the proteome.
Collapse
Affiliation(s)
- Umut Çakır
- Molecular Biology and Genetics Department, Faculty of Arts and Sciences, Boğaziçi University, Istanbul, Turkey
| | - Noujoud Gabed
- Cellular and Molecular Biology Department, Oran High School of Biological Sciences (ESSBO), Oran, Algeria
| | - Marie Brunet
- Department of Pediatrics, Medical Genetics Service, Université de Sherbrooke, QC, Canada.,Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), QC, Canada
| | - Xavier Roucou
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), QC, Canada.,Department of Biochemistry and Functional Genomics, Université de Sherbrooke, QC, Canada
| | - Igor Kryvoruchko
- Molecular Biology and Genetics Department, Faculty of Arts and Sciences, Boğaziçi University, Istanbul, Turkey
| |
Collapse
|
13
|
Berkeley RF, Debelouchina GT. Chemical tools for study and modulation of biomolecular phase transitions. Chem Sci 2022; 13:14226-14245. [PMID: 36545140 PMCID: PMC9749140 DOI: 10.1039/d2sc04907d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 11/21/2022] [Indexed: 11/23/2022] Open
Abstract
Biomolecular phase transitions play an important role in organizing cellular processes in space and time. Methods and tools for studying these transitions, and the intrinsically disordered proteins (IDPs) that often drive them, are typically less developed than tools for studying their folded protein counterparts. In this perspective, we assess the current landscape of chemical tools for studying IDPs, with a specific focus on protein liquid-liquid phase separation (LLPS). We highlight methodologies that enable imaging and spectroscopic studies of these systems, including site-specific labeling with small molecules and the diverse range of capabilities offered by inteins and protein semisynthesis. We discuss strategies for introducing post-translational modifications that are central to IDP and LLPS function and regulation. We also investigate the nascent field of noncovalent small-molecule modulators of LLPS. We hope that this review of the state-of-the-art in chemical tools for interrogating IDPs and LLPS, along with an associated perspective on areas of unmet need, can serve as a valuable and timely resource for these rapidly expanding fields of study.
Collapse
Affiliation(s)
- Raymond F. Berkeley
- Department of Chemistry and Biochemistry, University of California San DiegoLa JollaCAUSA
| | - Galia T. Debelouchina
- Department of Chemistry and Biochemistry, University of California San DiegoLa JollaCAUSA
| |
Collapse
|
14
|
Wiberg RAW, Viktorin G, Schärer L. Mating strategy predicts gene presence/absence patterns in a genus of simultaneously hermaphroditic flatworms. Evolution 2022; 76:3054-3066. [PMID: 36199200 PMCID: PMC10092323 DOI: 10.1111/evo.14635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 09/28/2022] [Indexed: 01/22/2023]
Abstract
Gene repertoire turnover is a characteristic of genome evolution. However, we lack well-replicated analyses of presence/absence patterns associated with different selection contexts. Here, we study ∼100 transcriptome assemblies across Macrostomum, a genus of simultaneously hermaphroditic flatworms exhibiting multiple convergent shifts in mating strategy and associated reproductive morphologies. Many species mate reciprocally, with partners donating and receiving sperm at the same time. Other species convergently evolved to mate by hypodermic injection of sperm into the partner. We find that for orthologous transcripts annotated as expressed in the body region containing the testes, sequences from hypodermically inseminating species diverge more rapidly from the model species, Macrostomum lignano, and have a lower probability of being observed in other species. For other annotation categories, simpler models with a constant rate of similarity decay with increasing genetic distance from M. lignano match the observed patterns well. Thus, faster rates of sequence evolution for hypodermically inseminating species in testis-region genes result in higher rates of homology detection failure, yielding a signal of rapid evolution in sequence presence/absence patterns. Our results highlight the utility of considering appropriate null models for unobserved genes, as well as associating patterns of gene presence/absence with replicated evolutionary events in a phylogenetic context.
Collapse
Affiliation(s)
- R Axel W Wiberg
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, CH-4051, Switzerland.,Evolutionary Biology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-75236, Sweden
| | - Gudrun Viktorin
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, CH-4051, Switzerland
| | - Lukas Schärer
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, CH-4051, Switzerland
| |
Collapse
|
15
|
Mahilkar A, Raj N, Kemkar S, Saini S. Selection in a growing colony biases results of mutation accumulation experiments. Sci Rep 2022; 12:15470. [PMID: 36104390 PMCID: PMC9475022 DOI: 10.1038/s41598-022-19928-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 09/06/2022] [Indexed: 11/11/2022] Open
Abstract
Mutations provide the raw material for natural selection to act. Therefore, understanding the variety and relative frequency of different type of mutations is critical to understanding the nature of genetic diversity in a population. Mutation accumulation (MA) experiments have been used in this context to estimate parameters defining mutation rates, distribution of fitness effects (DFE), and spectrum of mutations. MA experiments can be performed with different effective population sizes. In MA experiments with bacteria, a single founder is grown to a size of a colony (~ 108). It is assumed that natural selection plays a minimal role in dictating the dynamics of colony growth. In this work, we simulate colony growth via a mathematical model, and use our model to mimic an MA experiment. We demonstrate that selection ensures that, in an MA experiment, fraction of all mutations that are beneficial is over-represented by a factor of almost two, and that the distribution of fitness effects of beneficial and deleterious mutations are inaccurately captured in an MA experiment. Given this, the estimate of mutation rates from MA experiments is non-trivial. We then perform an MA experiment with 160 lines of E. coli, and show that due to the effect of selection in a growing colony, the size and sector of a colony from which the experiment is propagated impacts the results. Overall, we demonstrate that the results of MA experiments need to be revisited taking into account the action of selection in a growing colony.
Collapse
Affiliation(s)
- Anjali Mahilkar
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | - Namratha Raj
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | - Sharvari Kemkar
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | - Supreet Saini
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India.
| |
Collapse
|
16
|
OverFlap PCR: A reliable approach for generating plasmid DNA libraries containing random sequences without a template bias. PLoS One 2022; 17:e0262968. [PMID: 35939421 PMCID: PMC9359533 DOI: 10.1371/journal.pone.0262968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 07/17/2022] [Indexed: 11/19/2022] Open
Abstract
Over the decades, practical biotechnology researchers have aimed to improve naturally occurring proteins and create novel ones. It is widely recognized that coupling protein sequence randomization with various effect screening methodologies is one of the most powerful techniques for quickly, efficiently, and purposefully acquiring these desired improvements. Over the years, considerable advancements have been made in this field. However, developing PCR-based or template-guided methodologies has been hampered by resultant template sequence biases. Here, we present a novel whole plasmid amplification-based approach, which we named OverFlap PCR, for randomizing virtually any region of plasmid DNA without introducing a template sequence bias.
Collapse
|
17
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
18
|
Multiple Levels of Triggered Factors and the Obligated Requirement of Cell-to-Cell Movement in the Mutation Repair of Cucumber Mosaic Virus with Defects in the tRNA-like Structure. BIOLOGY 2022; 11:biology11071051. [PMID: 36101429 PMCID: PMC9312275 DOI: 10.3390/biology11071051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 07/07/2022] [Accepted: 07/12/2022] [Indexed: 11/17/2022]
Abstract
Simple Summary Based on analysis of the tRNA-like structure (TLS) mutation in cucumber mosaic virus (CMV), mutation repair is correlated with several levels of triggered factors, including the dose of inoculation of virus mutants, the quantity effect on corresponding viral RNA, and the quality effect on corresponding viral RNA. All types of TLS mutation in different RNAs of CMV can be repaired at a low dose around the dilution end-point. At a high dose of inoculation, TLS mutations in RNA2 and RNA3, but not RNA1, can be repaired, which correlates with the relative quantity defect of RNA2 or the genome size defect of RNA3. In addition, all the above types of mutation repair necessarily require cell-to-cell movement, which presents the obligated effect of cell-to-cell movement on mutation repair. Abstract Some debilitating mutations in RNA viruses are repairable; however, the triggering factors of mutation repair remain largely unknown. In this study, multiple triggering factors of mutation repair are identified based on genetic damage to the TLS in CMV. TLS mutations in different RNAs distinctively impact viral pathogenicity and present different types of mutation repair. RNA2 relative reduction level or RNA3 sequence change resulting from TLS mutation is correlated with a high rate of mutation repair, and the TLS mutation of RNA1 fails to be repaired at the high inoculum dose. However, the TLS mutation of RNA1 can be repaired at a low dose of inoculation, particularly around the dilution end-point or in the mixed inoculation with RNA2 having a pre-termination mutation of the 2b gene, an RNAi suppressor. Taken together, TLS mutations resulting in quality or quantity defects of the viral genome or TLS mutations at low doses around the dilution end-point are likely to be repaired. Different levels of TLS mutation repair necessarily require cell-to-cell movement, therefore implying its obligated effect on the evolution of low-fitness viruses and providing a new insight into Muller’s ratchet. This study provides important information on virus evolution and the application of mild viral vaccines.
Collapse
|
19
|
Kosinski LJ, Aviles NR, Gomez K, Masel J. Random peptides rich in small and disorder-promoting amino acids are less likely to be harmful. Genome Biol Evol 2022; 14:evac085. [PMID: 35668555 PMCID: PMC9210321 DOI: 10.1093/gbe/evac085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/01/2022] [Accepted: 05/27/2022] [Indexed: 11/15/2022] Open
Abstract
Proteins are the workhorses of the cell, yet they carry great potential for harm via misfolding and aggregation. Despite the dangers, proteins are sometimes born de novo from non-coding DNA. Proteins are more likely to be born from non-coding regions that produce peptides that do little to no harm when translated than from regions that produce harmful peptides. To investigate which newborn proteins are most likely to "first, do no harm", we estimate fitnesses from an experiment that competed Escherichia coli lineages that each expressed a unique random peptide. A variety of peptide metrics significantly predict lineage fitness, but this predictive power stems from simple amino acid frequencies rather than the ordering of amino acids. Amino acids that are smaller and that promote intrinsic structural disorder have more benign fitness effects. We validate that the amino acids that indicate benign effects in random peptides expressed in E. coli also do so in an independent dataset of random N-terminal tags in which it is possible to control for expression level. The same amino acids are also enriched in young animal proteins.
Collapse
Affiliation(s)
- Luke J Kosinski
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, USA
| | - Nathan R Aviles
- Graduate Interdisciplinary Program in Statistics, University of Arizona, Tucson, USA
| | - Kevin Gomez
- Graduate Interdisciplinary Program in Applied Math, University of Arizona, Tucson, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, USA
| |
Collapse
|
20
|
Tretyachenko V, Vymětal J, Neuwirthová T, Vondrášek J, Fujishima K, Hlouchová K. Modern and prebiotic amino acids support distinct structural profiles in proteins. Open Biol 2022; 12:220040. [PMID: 35728622 PMCID: PMC9213115 DOI: 10.1098/rsob.220040] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The earliest proteins had to rely on amino acids available on early Earth before the biosynthetic pathways for more complex amino acids evolved. In extant proteins, a significant fraction of the 'late' amino acids (such as Arg, Lys, His, Cys, Trp and Tyr) belong to essential catalytic and structure-stabilizing residues. How (or if) early proteins could sustain an early biosphere has been a major puzzle. Here, we analysed two combinatorial protein libraries representing proxies of the available sequence space at two different evolutionary stages. The first is composed of the entire alphabet of 20 amino acids while the second one consists of only 10 residues (ASDGLIPTEV) representing a consensus view of plausibly available amino acids through prebiotic chemistry. We show that compact conformations resistant to proteolysis are surprisingly similarly abundant in both libraries. In addition, the early alphabet proteins are inherently more soluble and refoldable, independent of the general Hsp70 chaperone activity. By contrast, chaperones significantly increase the otherwise poor solubility of the modern alphabet proteins suggesting their coevolution with the amino acid repertoire. Our work indicates that while both early and modern amino acids are predisposed to supporting protein structure, they do so with different biophysical properties and via different mechanisms.
Collapse
Affiliation(s)
- Vyacheslav Tretyachenko
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic,Department of Biochemistry, Faculty of Science, Charles University, Prague 12843, Czech Republic
| | - Jiří Vymětal
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| | - Tereza Neuwirthová
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| | - Kosuke Fujishima
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 1528550, Japan,Graduate School of Media and Governance, Keio University, Fujisawa 2520882 Japan
| | - Klára Hlouchová
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic,Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| |
Collapse
|
21
|
Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat Biotechnol 2022; 40:921-931. [PMID: 35241840 DOI: 10.1038/s41587-022-01226-0] [Citation(s) in RCA: 127] [Impact Index Per Article: 63.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 01/19/2022] [Indexed: 02/07/2023]
Abstract
The human gut microbiome encodes a large variety of antimicrobial peptides (AMPs), but the short lengths of AMPs pose a challenge for computational prediction. Here we combined multiple natural language processing neural network models, including LSTM, Attention and BERT, to form a unified pipeline for candidate AMP identification from human gut microbiome data. Of 2,349 sequences identified as candidate AMPs, 216 were chemically synthesized, with 181 showing antimicrobial activity (a positive rate of >83%). Most of these peptides have less than 40% sequence homology to AMPs in the training set. Further characterization of the 11 most potent AMPs showed high efficacy against antibiotic-resistant, Gram-negative pathogens and demonstrated significant efficacy in lowering bacterial load by more than tenfold against a mouse model of bacterial lung infection. Our study showcases the potential of machine learning approaches for mining functional peptides from metagenome data and accelerating the discovery of promising AMP candidate molecules for in-depth investigations.
Collapse
|
22
|
Heinen T, Xie C, Keshavarz M, Stappert D, Künzel S, Tautz D. Evolution of a New Testis-Specific Functional Promoter Within the Highly Conserved Map2k7 Gene of the Mouse. Front Genet 2022; 12:812139. [PMID: 35069705 PMCID: PMC8766832 DOI: 10.3389/fgene.2021.812139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/08/2021] [Indexed: 12/03/2022] Open
Abstract
Map2k7 (synonym Mkk7) is a conserved regulatory kinase gene and a central component of the JNK signaling cascade with key functions during cellular differentiation. It shows complex transcription patterns, and different transcript isoforms are known in the mouse (Mus musculus). We have previously identified a newly evolved testis-specific transcript for the Map2k7 gene in the subspecies M. m. domesticus. Here, we identify the new promoter that drives this transcript and find that it codes for an open reading frame (ORF) of 50 amino acids. The new promoter was gained in the stem lineage of closely related mouse species but was secondarily lost in the subspecies M. m. musculus and M. m. castaneus. A single mutation can be correlated with its transcriptional activity in M. m. domesticus, and cell culture assays demonstrate the capability of this mutation to drive expression. A mouse knockout line in which the promoter region of the new transcript is deleted reveals a functional contribution of the newly evolved promoter to sperm motility and the spermatid transcriptome. Our data show that a new functional transcript (and possibly protein) can evolve within an otherwise highly conserved gene, supporting the notion of regulatory changes contributing to the emergence of evolutionary novelties.
Collapse
Affiliation(s)
| | - Chen Xie
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Max-Plank Institute for Evolutionary Biology, Plön, Germany.,Deutsches Zentrum für Neurodegenerative Erkrankungen e. V. (DZNE), Bonn, Germany
| | - Dominik Stappert
- Deutsches Zentrum für Neurodegenerative Erkrankungen e. V. (DZNE), Bonn, Germany
| | - Sven Künzel
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
23
|
Dingle K, Ghaddar F, Šulc P, Louis AA. Phenotype Bias Determines How Natural RNA Structures Occupy the Morphospace of All Possible Shapes. Mol Biol Evol 2022; 39:msab280. [PMID: 34542628 PMCID: PMC8763027 DOI: 10.1093/molbev/msab280] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Morphospaces-representations of phenotypic characteristics-are often populated unevenly, leaving large parts unoccupied. Such patterns are typically ascribed to contingency, or else to natural selection disfavoring certain parts of the morphospace. The extent to which developmental bias, the tendency of certain phenotypes to preferentially appear as potential variation, also explains these patterns is hotly debated. Here we demonstrate quantitatively that developmental bias is the primary explanation for the occupation of the morphospace of RNA secondary structure (SS) shapes. Upon random mutations, some RNA SS shapes (the frequent ones) are much more likely to appear than others. By using the RNAshapes method to define coarse-grained SS classes, we can directly compare the frequencies that noncoding RNA SS shapes appear in the RNAcentral database to frequencies obtained upon a random sampling of sequences. We show that: 1) only the most frequent structures appear in nature; the vast majority of possible structures in the morphospace have not yet been explored; 2) remarkably small numbers of random sequences are needed to produce all the RNA SS shapes found in nature so far; and 3) perhaps most surprisingly, the natural frequencies are accurately predicted, over several orders of magnitude in variation, by the likelihood that structures appear upon a uniform random sampling of sequences. The ultimate cause of these patterns is not natural selection, but rather a strong phenotype bias in the RNA genotype-phenotype map, a type of developmental bias or "findability constraint," which limits evolutionary dynamics to a hugely reduced subset of structures that are easy to "find."
Collapse
Affiliation(s)
- Kamaludin Dingle
- Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Hawally, Kuwait
| | - Fatme Ghaddar
- Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Hawally, Kuwait
| | - Petr Šulc
- School of Molecular Sciences and Center for Molecular Design and Biomimetics at the Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
24
|
Papadopoulos C, Chevrollier N, Lopes A. Exploring the Peptide Potential of Genomes. Methods Mol Biol 2022; 2405:63-82. [PMID: 35298808 DOI: 10.1007/978-1-0716-1855-4_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent studies attribute a central role to the noncoding genome in the emergence of novel genes. The widespread transcription of noncoding regions and the pervasive translation of the resulting RNAs offer to the organisms a vast reservoir of novel peptides. Although the majority of these peptides are anticipated as deleterious or neutral, and thereby expected to be degraded right away or short-lived in evolutionary history, some of them can confer an advantage to the organism. The latter can be further subjected to natural selection and be established as novel genes. In any case, characterizing the structural properties of these pervasively translated peptides is crucial to understand (1) their impact on the cell and (2) how some of these peptides, derived from presumed noncoding regions, can give rise to structured and functional de novo proteins. Therefore, we present a protocol that aims to explore the potential of a genome to produce novel peptides. It consists in annotating all the open reading frames (ORFs) of a genome (i.e., coding and noncoding ones) and characterizing the fold potential and other structural properties of their corresponding potential peptides. Here, we apply our protocol to a small genome and show how to apply it to very large genomes. Finally, we present a case study which aims to probe the fold potential of a set of 721 translated ORFs in mouse lncRNAs, identified with ribosome profiling experiments. Interestingly, we show that the distribution of their fold potential is different from that of the nontranslated lncRNAs and more generally from the other noncoding ORFs of the mouse.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France
| | - Nicolas Chevrollier
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France
| | - Anne Lopes
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France.
| |
Collapse
|
25
|
Bhave D, Tautz D. Effects of the Expression of Random Sequence Clones on Growth and Transcriptome Regulation in Escherichia coli. Genes (Basel) 2021; 13:genes13010053. [PMID: 35052392 PMCID: PMC8775113 DOI: 10.3390/genes13010053] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 12/21/2021] [Accepted: 12/21/2021] [Indexed: 02/04/2023] Open
Abstract
Comparative genomic analyses have provided evidence that new genetic functions can emerge out of random nucleotide sequences. Here, we apply a direct experimental approach to study the effects of plasmids harboring random sequence inserts under the control of an inducible promoter. Based on data from previously described experiments dealing with the growth of clones within whole libraries, we extracted specific clones that had shown either negative, neutral or positive effects on relative cell growth. We analyzed these individually with respect to growth characteristics and the impact on the transcriptome. We find that candidate clones for negative peptides lead to growth arrest by eliciting a general stress response. Overexpression of positive clones, on the other hand, does not change the exponential growth rates of hosts, and they show a growth advantage over a neutral clone when tested in direct competition experiments. Transcriptomic changes in positive clones are relatively moderate and specific to each clone. We conclude from our experiments that random sequence peptides are indeed a suitable source for the de novo evolution of genetic functions.
Collapse
|
26
|
Li J, Singh U, Bhandary P, Campbell J, Arendsee Z, Seetharam AS, Wurtele ES. Foster thy young: enhanced prediction of orphan genes in assembled genomes. Nucleic Acids Res 2021; 50:e37. [PMID: 34928390 PMCID: PMC9023268 DOI: 10.1093/nar/gkab1238] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/22/2021] [Accepted: 12/02/2021] [Indexed: 02/06/2023] Open
Abstract
Proteins encoded by newly-emerged genes ('orphan genes') share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.
Collapse
Affiliation(s)
- Jing Li
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA
| | - Urminder Singh
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Priyanka Bhandary
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Jacqueline Campbell
- Corn Insects and Crop Genetics Research Unit, US Department of Agriculture Agriculture Research Service, Ames, IA 50014, USA
| | - Zebulun Arendsee
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Iowa State University, Ames, IA 50014, USA
| | - Eve Syrkin Wurtele
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| |
Collapse
|
27
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
28
|
Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel) 2021; 12:1913. [PMID: 34946861 PMCID: PMC8702183 DOI: 10.3390/genes12121913] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
We study the potential for the de novo evolution of genes from random nucleotide sequences using libraries of E. coli expressing random sequence peptides. We assess the effects of such peptides on cell growth by monitoring frequency changes in individual clones in a complex library through four serial passages. Using a new analysis pipeline that allows the tracing of peptides of all lengths, we find that over half of the peptides have consistent effects on cell growth. Across nine different experiments, around 16% of clones increase in frequency and 36% decrease, with some variation between individual experiments. Shorter peptides (8-20 residues), are more likely to increase in frequency, longer ones are more likely to decrease. GC content, amino acid composition, intrinsic disorder, and aggregation propensity show slightly different patterns between peptide groups. Sequences that increase in frequency tend to be more disordered with lower aggregation propensity. This coincides with the observation that young genes with more disordered structures are better tolerated in genomes. Our data indicate that random sequences can be a source of evolutionary innovation, since a large fraction of them are well tolerated by the cells or can provide a growth advantage.
Collapse
Affiliation(s)
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology, August-Thienemann Strasse 2, 24306 Plön, Germany;
| |
Collapse
|
29
|
Roy S, Sengupta S. Evolution towards increasing complexity through functional diversification in a protocell model of the RNA world. Proc Biol Sci 2021; 288:20212098. [PMID: 34784760 PMCID: PMC8596018 DOI: 10.1098/rspb.2021.2098] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 10/21/2021] [Indexed: 11/12/2022] Open
Abstract
The encapsulation of genetic material inside compartments together with the creation and sustenance of functionally diverse internal components are likely to have been key steps in the formation of 'live', replicating protocells in an RNA world. Several experiments have shown that RNA encapsulated inside lipid vesicles can lead to vesicular growth and division through physical processes alone. Replication of RNA inside such vesicles can produce a large number of RNA strands. Yet, the impact of such replication processes on the emergence of the first ribozymes inside such protocells and on the subsequent evolution of the protocell population remains an open question. In this paper, we present a model for the evolution of protocells with functionally diverse ribozymes. Distinct ribozymes can be created with small probabilities during the error-prone RNA replication process via the rolling circle mechanism. We identify the conditions that can synergistically enhance the number of different ribozymes inside a protocell and allow functionally diverse protocells containing multiple ribozymes to dominate the population. Our work demonstrates the existence of an effective pathway towards increasing complexity of protocells that might have eventually led to the origin of life in an RNA world.
Collapse
Affiliation(s)
- Suvam Roy
- Department of Physical Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur-741246, India
| | - Supratim Sengupta
- Department of Physical Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur-741246, India
| |
Collapse
|
30
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
31
|
Lyko P, Wicke S. Genomic reconfiguration in parasitic plants involves considerable gene losses alongside global genome size inflation and gene births. PLANT PHYSIOLOGY 2021; 186:1412-1423. [PMID: 33909907 PMCID: PMC8260112 DOI: 10.1093/plphys/kiab192] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 04/13/2021] [Indexed: 05/02/2023]
Abstract
Parasitic plant genomes and transcriptomes reveal numerous genetic innovations, the functional-evolutionary relevance and roles of which open unprecedented research avenues.
Collapse
Affiliation(s)
- Peter Lyko
- Institute for Biology, Humboldt-University of Berlin, Germany
| | - Susann Wicke
- Institute for Biology, Humboldt-University of Berlin, Germany
- Author for communication:
| |
Collapse
|
32
|
Tretyachenko V, Voráček V, Souček R, Fujishima K, Hlouchová K. CoLiDe: Combinatorial Library Design tool for probing protein sequence space. Bioinformatics 2021; 37:482-489. [PMID: 32956450 PMCID: PMC8088326 DOI: 10.1093/bioinformatics/btaa804] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/28/2020] [Accepted: 09/07/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Current techniques of protein engineering focus mostly on re-designing small targeted regions or defined structural scaffolds rather than constructing combinatorial libraries of versatile compositions and lengths. This is a missed opportunity because combinatorial libraries are emerging as a vital source of novel functional proteins and are of interest in diverse research areas. RESULTS Here, we present a computational tool for Combinatorial Library Design (CoLiDe) offering precise control over protein sequence composition, length and diversity. The algorithm uses evolutionary approach to provide solutions to combinatorial libraries of degenerate DNA templates. We demonstrate its performance and precision using four different input alphabet distribution on different sequence lengths. In addition, a model design and experimental pipeline for protein library expression and purification is presented, providing a proof-of-concept that our protocol can be used to prepare purified protein library samples of up to 1011-1012 unique sequences. CoLiDe presents a composition-centric approach to protein design towards different functional phenomena. AVAILABILITYAND IMPLEMENTATION CoLiDe is implemented in Python and freely available at https://github.com/voracva1/CoLiDe. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vyacheslav Tretyachenko
- Department of Cell Biology, Faculty of Science, Charles University, Biocev, Prague, Czech Republic.,Department of Biochemistry, Faculty of Science, Charles University, 128 00 Prague 2, Czech Republic
| | - Václav Voráček
- Department of Cybernetics, Center for Machine Perception, Faculty of Electrical Engineering, Czech Technical University in Prague, 166 27 Prague, Czech Republic
| | - Radko Souček
- Institute of Organic Chemistry and Biochemistry IOCB Research Centre & Gilead Sciences, Academy of Sciences of the Czech Republic, 166 10 Prague, Czech Republic
| | - Kosuke Fujishima
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 1528550, Japan
| | - Klára Hlouchová
- Department of Cell Biology, Faculty of Science, Charles University, Biocev, Prague, Czech Republic.,Institute of Organic Chemistry and Biochemistry IOCB Research Centre & Gilead Sciences, Academy of Sciences of the Czech Republic, 166 10 Prague, Czech Republic
| |
Collapse
|
33
|
Miller RV, Neme R, Clay DM, Pathmanathan JS, Lu MW, Yerlici VT, Khurana JS, Landweber LF. Transcribed germline-limited coding sequences in Oxytricha trifallax. G3-GENES GENOMES GENETICS 2021; 11:6192809. [PMID: 33772542 PMCID: PMC8495736 DOI: 10.1093/g3journal/jkab092] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 02/26/2021] [Indexed: 01/13/2023]
Abstract
The germline-soma divide is a fundamental distinction in developmental biology, and different genes are expressed in germline and somatic cells throughout metazoan life cycles. Ciliates, a group of microbial eukaryotes, exhibit germline-somatic nuclear dimorphism within a single cell with two different genomes. The ciliate Oxytricha trifallax undergoes massive RNA-guided DNA elimination and genome rearrangement to produce a new somatic macronucleus (MAC) from a copy of the germline micronucleus (MIC). This process eliminates noncoding DNA sequences that interrupt genes and also deletes hundreds of germline-limited open reading frames (ORFs) that are transcribed during genome rearrangement. Here, we update the set of transcribed germline-limited ORFs (TGLOs) in O. trifallax. We show that TGLOs tend to be expressed during nuclear development and then are absent from the somatic MAC. We also demonstrate that exposure to synthetic RNA can reprogram TGLO retention in the somatic MAC and that TGLO retention leads to transcription outside the normal developmental program. These data suggest that TGLOs represent a group of developmentally regulated protein-coding sequences whose gene expression is terminated by DNA elimination.
Collapse
Affiliation(s)
- Richard V Miller
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.,Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Rafik Neme
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Derek M Clay
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.,Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Jananan S Pathmanathan
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Michael W Lu
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.,Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - V Talya Yerlici
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Jaspreet S Khurana
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Laura F Landweber
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.,Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| |
Collapse
|
34
|
Knopp M, Babina AM, Gudmundsdóttir JS, Douglass MV, Trent MS, Andersson DI. A novel type of colistin resistance genes selected from random sequence space. PLoS Genet 2021; 17:e1009227. [PMID: 33411736 PMCID: PMC7790251 DOI: 10.1371/journal.pgen.1009227] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 10/27/2020] [Indexed: 11/29/2022] Open
Abstract
Antibiotic resistance is a rapidly increasing medical problem that severely limits the success of antibiotic treatments, and the identification of resistance determinants is key for surveillance and control of resistance dissemination. Horizontal transfer is the dominant mechanism for spread of resistance genes between bacteria but little is known about the original emergence of resistance genes. Here, we examined experimentally if random sequences can generate novel antibiotic resistance determinants de novo. By utilizing highly diverse expression libraries encoding random sequences to select for open reading frames that confer resistance to the last-resort antibiotic colistin in Escherichia coli, six de novocolistin resistance conferring peptides (Dcr) were identified. The peptides act via direct interactions with the sensor kinase PmrB (also termed BasS in E. coli), causing an activation of the PmrAB two-component system (TCS), modification of the lipid A domain of lipopolysaccharide and subsequent colistin resistance. This kinase-activation was extended to other TCS by generation of chimeric sensor kinases. Our results demonstrate that peptides with novel activities mediated via specific peptide-protein interactions in the transmembrane domain of a sensory transducer can be selected de novo, suggesting that the origination of such peptides from non-coding regions is conceivable. In addition, we identified a novel class of resistance determinants for a key antibiotic that is used as a last resort treatment for several significant pathogens. The high-level resistance provided at low expression levels, absence of significant growth defects and the functionality of Dcr peptides across different genera suggest that this class of peptides could potentially evolve as bona fide resistance determinants in natura. We expressed over 100 million randomly generated DNA sequences in Escherichia coli and selected 6 variants that encode peptides that provide resistance to the last-resort antibiotic colistin. We show that the selected peptides are auxiliary activators of the two-component system PmrAB, and that resistance is mediated via modifications of the cell envelope causing decreased antibiotic uptake. This is the first example where random expression libraries have been employed to select for peptides that perform an activating function by direct peptide-protein interactions in vivo, adding support to the idea that non-coding DNA can serve as a substrate for de novo gene evolution. Additionally, the described peptides expand the narrow list of colistin resistance genes and further analyses of clinical isolates will be necessary to determine if similar resistance determinants have evolved in natura.
Collapse
Affiliation(s)
- Michael Knopp
- Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- * E-mail: (MK); (DIA)
| | - Arianne M. Babina
- Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| | | | - Martin V. Douglass
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Georgia, United States of America
| | - M. Stephen Trent
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Georgia, United States of America
- Department of Microbiology, Franklin College of Arts and Sciences, University of Georgia, Georgia, United States of America
| | - Dan I. Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
- * E-mail: (MK); (DIA)
| |
Collapse
|
35
|
Zile K, Dessimoz C, Wurm Y, Masel J. Only a Single Taxonomically Restricted Gene Family in the Drosophila melanogaster Subgroup Can Be Identified with High Confidence. Genome Biol Evol 2020; 12:1355-1366. [PMID: 32589737 PMCID: PMC8059200 DOI: 10.1093/gbe/evaa127] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2020] [Indexed: 12/12/2022] Open
Abstract
Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia. It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.
Collapse
Affiliation(s)
- Karina Zile
- Division of Biosciences, University College London, United Kingdom
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Switzerland
- Department of Genetics, Evolution and Environment, University College London, United Kingdom
- Department of Computer Science, University College London, United Kingdom
| | - Yannick Wurm
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona
| |
Collapse
|
36
|
The application of biomacromolecules to improve oral absorption by enhanced intestinal permeability: A mini-review. CHINESE CHEM LETT 2020. [DOI: 10.1016/j.cclet.2020.02.035] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
37
|
|
38
|
Vakirlis N, Acar O, Hsu B, Castilho Coelho N, Van Oss SB, Wacholder A, Medetgul-Ernar K, Bowman RW, Hines CP, Iannotta J, Parikh SB, McLysaght A, Camacho CJ, O'Donnell AF, Ideker T, Carvunis AR. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat Commun 2020; 11:781. [PMID: 32034123 PMCID: PMC7005711 DOI: 10.1038/s41467-020-14500-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 12/20/2019] [Indexed: 11/14/2022] Open
Abstract
Recent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection. There is increasing evidence that protein-coding genes can emerge de novo from noncoding genomic regions. Vakirlis et al. propose that sequences encoding transmembrane polypeptides can emerge de novo in thymine-rich genomic regions and provide organisms with fitness benefits.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Brian Hsu
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - S Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Kate Medetgul-Ernar
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Ray W Bowman
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States
| | - Cameron P Hines
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - John Iannotta
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Carlos J Camacho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Allyson F O'Donnell
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States.
| | - Trey Ideker
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States.
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.
| |
Collapse
|
39
|
Arendsee Z, Li J, Singh U, Bhandary P, Seetharam A, Wurtele ES. fagin: synteny-based phylostratigraphy and finer classification of young genes. BMC Bioinformatics 2019; 20:440. [PMID: 31455236 PMCID: PMC6712868 DOI: 10.1186/s12859-019-3023-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 08/08/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the "Unknown" A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation.
Collapse
Affiliation(s)
- Zebulun Arendsee
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Jing Li
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
| | - Urminder Singh
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Arun Seetharam
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA, 50011, USA
| | - Eve Syrkin Wurtele
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA.
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA.
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
40
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Tautz D. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife 2019; 8:44392. [PMID: 31436535 PMCID: PMC6760900 DOI: 10.7554/elife.44392] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 08/21/2019] [Indexed: 12/16/2022] Open
Abstract
The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation. Different species have specific genes that set them apart from other species. Yet exactly how these species-specific genes originate is not fully known. The traditional view is that existing old genes are duplicated to make a ‘spare’ copy, which can change through mutations into a new gene with a new role gradually over time. Despite there being lots of evidence supporting this theory, not all new genes found in recent years can be traced back to older genes. This led to an alternative view – that recently evolved genes can also appear ‘de novo’, and come from regions of random DNA sequences that did not previously code for a protein. So far, the possibility of genes forming de novo during evolution has largely been supported by comparing and analyzing the genomes of related species. However, very little is known about the biological role these de novo genes play. Now, Xie et al. have generated a list of recently evolved de novo mouse genes, and carried out a detailed analysis of one de novo gene expressed in females at the time when embryos implant into the uterus wall. To study the role of this gene, Xie et al. created a strain of knock-out mice that have a defunct version of the protein coded by the gene. Loss of this protein caused female mice to have their second litter after a shorter period of time and increased the likelihood that female mice would terminate their newborn pups. This suggests that this newly discovered de novo gene is involved in regulating the female reproductive cycles of mice. Further analysis showed that this de novo gene counteracts the action of an older gene that promotes the implantation of embryos. This gene has therefore likely evolved due to the benefit it offers mothers, as it protects them from experiencing the increased physiological stress caused by a premature second pregnancy. These findings support the idea that genes which have evolved de novo can have an essential biological purpose despite coming from random DNA sequences. This establishes that de novo evolution of genes is the second major mechanism of how new genes with significant biological roles can form in the genome.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian Karsten Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
41
|
Nielly-Thibault L, Landry CR. Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases. Genetics 2019; 212:1353-1366. [PMID: 31227545 PMCID: PMC6707459 DOI: 10.1534/genetics.119.302187] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/14/2019] [Indexed: 12/03/2022] Open
Abstract
Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.
Collapse
Affiliation(s)
- Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| |
Collapse
|
42
|
Durand É, Gagnon-Arsenault I, Hallin J, Hatin I, Dubé AK, Nielly-Thibault L, Namy O, Landry CR. Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations. Genome Res 2019; 29:932-943. [PMID: 31152050 PMCID: PMC6581059 DOI: 10.1101/gr.239822.118] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 05/13/2019] [Indexed: 12/17/2022]
Abstract
Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of intergenic ORFs show translation signatures similar to canonical genes, and we experimentally confirmed the translation of many of these ORFs in laboratory conditions using a reporter assay. Compared with canonical genes, intergenic ORFs have lower translation efficiency, which could imply a lack of optimization for translation or a mechanism to reduce their production cost. Translated intergenic ORFs also tend to have sequence properties that are generally close to those of random intergenic sequences. However, some of the very recent translated intergenic ORFs, which appeared <110 kya, already show gene-like characteristics, suggesting that the raw material for functional innovations could appear over short evolutionary timescales.
Collapse
Affiliation(s)
- Éléonore Durand
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Johan Hallin
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Hatin
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Olivier Namy
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| |
Collapse
|
43
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
44
|
Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol 2019; 3:679-690. [PMID: 30858588 DOI: 10.1038/s41559-019-0822-5] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 01/23/2019] [Indexed: 12/22/2022]
Abstract
New protein-coding genes that arise de novo from non-coding DNA sequences contribute to protein diversity. However, de novo gene origination is challenging to study as it requires high-quality reference genomes for closely related species, evidence for ancestral non-coding sequences, and transcription and translation of the new genes. High-quality genomes of 13 closely related Oryza species provide unprecedented opportunities to understand de novo origination events. Here, we identify a large number of young de novo genes with discernible recent ancestral non-coding sequences and evidence of translation. Using pipelines examining the synteny relationship between genomes and reciprocal-best whole-genome alignments, we detected at least 175 de novo open reading frames in the focal species O. sativa subspecies japonica, which were all detected in RNA sequencing-based transcriptomes. Mass spectrometry-based targeted proteomics and ribosomal profiling show translational evidence for 57% of the de novo genes. In recent divergence of Oryza, an average of 51.5 de novo genes per million years were generated and retained. We observed evolutionary patterns in which excess indels and early transcription were favoured in origination with a stepwise formation of gene structure. These data reveal that de novo genes contribute to the rapid evolution of protein diversity under positive selection.
Collapse
|
45
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
46
|
Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol 2018; 2:1626-1632. [DOI: 10.1038/s41559-018-0639-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 07/09/2018] [Indexed: 11/08/2022]
|
47
|
Mittal P, Brindle J, Stephen J, Plotkin JB, Kudla G. Codon usage influences fitness through RNA toxicity. Proc Natl Acad Sci U S A 2018; 115:8639-8644. [PMID: 30082392 PMCID: PMC6112741 DOI: 10.1073/pnas.1810022115] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Many organisms are subject to selective pressure that gives rise to unequal usage of synonymous codons, known as codon bias. To experimentally dissect the mechanisms of selection on synonymous sites, we expressed several hundred synonymous variants of the GFP gene in Escherichia coli, and used quantitative growth and viability assays to estimate bacterial fitness. Unexpectedly, we found many synonymous variants whose expression was toxic to E. coli Unlike previously studied effects of synonymous mutations, the effect that we discovered is independent of translation, but it depends on the production of toxic mRNA molecules. We identified RNA sequence determinants of toxicity and evolved suppressor strains that can tolerate the expression of toxic GFP variants. Genome sequencing of these suppressor strains revealed a cluster of promoter mutations that prevented toxicity by reducing mRNA levels. We conclude that translation-independent RNA toxicity is a previously unrecognized obstacle in bacterial gene expression.
Collapse
Affiliation(s)
- Pragya Mittal
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom
| | - James Brindle
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom
| | - Julie Stephen
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104
| | - Grzegorz Kudla
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom;
| |
Collapse
|
48
|
Pellestor F, Gatinois V. Chromothripsis, a credible chromosomal mechanism in evolutionary process. Chromosoma 2018; 128:1-6. [DOI: 10.1007/s00412-018-0679-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 07/31/2018] [Accepted: 08/02/2018] [Indexed: 01/17/2023]
|
49
|
Abstract
De novo genes are very important for evolutionary innovation. However, how these genes originate and spread remains largely unknown. To better understand this, we rigorously searched for de novo genes in Saccharomyces cerevisiae S288C and examined their spread and fixation in the population. Here, we identified 84 de novo genes in S. cerevisiae S288C since the divergence with their sister groups. Transcriptome and ribosome profiling data revealed at least 8 (10%) and 28 (33%) de novo genes being expressed and translated only under specific conditions, respectively. DNA microarray data, based on 2-fold change, showed that 87% of the de novo genes are regulated during various biological processes, such as nutrient utilization and sporulation. Our comparative and evolutionary analyses further revealed that some factors, including single nucleotide polymorphism (SNP)/indel mutation, high GC content, and DNA shuffling, contribute to the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we also provide evidence suggesting the possible parallel origin of a de novo gene between S. cerevisiae and Saccharomyces paradoxus. Together, our study provides several new insights into the origin and spread of de novo genes. Emergence of de novo genes has occurred in many lineages during evolution, but the birth, spread, and function of these genes remain unresolved. Here we have searched for de novo genes from Saccharomyces cerevisiae S288C using rigorous methods, which reduced the effects of bad annotation and genomic gaps on the identification of de novo genes. Through this analysis, we have found 84 new genes originating de novo from previously noncoding regions, 87% of which are very likely involved in various biological processes. We noticed that 10% and 33% of de novo genes were only expressed and translated under specific conditions, therefore, verification of de novo genes through transcriptome and ribosome profiling, especially from limited expression data, may underestimate the number of bona fide new genes. We further show that SNP/indel mutation, high GC content, and DNA shuffling could be involved in the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we provide evidence suggesting the possible parallel origin of a new gene.
Collapse
|
50
|
|