1
|
Rich A, Acar O, Carvunis AR. Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome. Genome Biol 2024; 25:183. [PMID: 38978079 PMCID: PMC11232214 DOI: 10.1186/s13059-024-03287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 05/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.
Collapse
Affiliation(s)
- April Rich
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Omer Acar
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Lebherz MK, Iyengar BR, Bornberg-Bauer E. Modeling Length Changes in De Novo Open Reading Frames during Neutral Evolution. Genome Biol Evol 2024; 16:evae129. [PMID: 38879874 DOI: 10.1093/gbe/evae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2024] [Indexed: 07/06/2024] Open
Abstract
For protein coding genes to emerge de novo from a non-genic DNA, the DNA sequence must gain an open reading frame (ORF) and the ability to be transcribed. The newborn de novo gene can further evolve to accumulate changes in its sequence. Consequently, it can also elongate or shrink with time. Existing literature shows that older de novo genes have longer ORF, but it is not clear if they elongated with time or remained of the same length since their inception. To address this question we developed a mathematical model of ORF elongation as a Markov-jump process, and show that ORFs tend to keep their length in short evolutionary timescales. We also show that if change occurs it is likely to be a truncation. Our genomics and transcriptomics data analyses of seven Drosophila melanogaster populations are also in agreement with the model's prediction. We conclude that selection could facilitate ORF length extension that may explain why longer ORFs were observed in old de novo genes in studies analysing longer evolutionary time scales. Alternatively, shorter ORFs may be purged because they may be less likely to yield functional proteins.
Collapse
Affiliation(s)
- Marie Kristin Lebherz
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen 72076, Germany
| |
Collapse
|
3
|
Vara C, Montañés JC, Albà MM. High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population. Genome Biol Evol 2024; 16:evae126. [PMID: 38934859 PMCID: PMC11221430 DOI: 10.1093/gbe/evae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024] Open
Abstract
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
Collapse
Affiliation(s)
- Covadonga Vara
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - José Carlos Montañés
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - M Mar Albà
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
4
|
Lee U, Mozeika SM, Zhao L. A Synergistic, Cultivator Model of De Novo Gene Origination. Genome Biol Evol 2024; 16:evae103. [PMID: 38748819 PMCID: PMC11152449 DOI: 10.1093/gbe/evae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 06/07/2024] Open
Abstract
The origin and fixation of evolutionarily young genes is a fundamental question in evolutionary biology. However, understanding the origins of newly evolved genes arising de novo from noncoding genomic sequences is challenging. This is partly due to the low likelihood that several neutral or nearly neutral mutations fix prior to the appearance of an important novel molecular function. This issue is particularly exacerbated in large effective population sizes where the effect of drift is small. To address this problem, we propose a regulation-focused, cultivator model for de novo gene evolution. This cultivator-focused model posits that each step in a novel variant's evolutionary trajectory is driven by well-defined, selectively advantageous functions for the cultivator genes, rather than solely by the de novo genes, emphasizing the critical role of genome organization in the evolution of new genes.
Collapse
Affiliation(s)
- UnJin Lee
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Shawn M Mozeika
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| |
Collapse
|
5
|
Wehbi S, Wheeler A, Morel B, Minh BQ, Lauretta DS, Masel J. Order of amino acid recruitment into the genetic code resolved by Last Universal Common Ancestor's protein domains. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.13.589375. [PMID: 38659899 PMCID: PMC11042313 DOI: 10.1101/2024.04.13.589375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The current "consensus" order in which amino acids were added to the genetic code is based on potentially biased criteria such as absence of sulfur-containing amino acids from the Urey-Miller experiment which lacked sulfur. Even if inferred perfectly, abiotic abundance might not reflect abundance in the organisms in which the genetic code evolved. Here, we instead exploit the fact that proteins that emerged prior to the genetic code's completion are likely enriched in early amino acids and depleted in late amino acids. We identify the most ancient protein-coding sequences born prior to the archaeal-bacterial split. Amino acid usage in protein sequences whose ancestors date back to a single homolog in the Last Universal Common Ancestor (LUCA) largely matches the consensus order. However, our findings indicate that metal-binding (cysteine and histidine) and sulfur-containing (cysteine and methionine) amino acids were added to the genetic code much earlier than previously thought. Surprisingly, even more ancient protein sequences - those that had already diversified into multiple distinct copies in LUCA - show a different pattern to single copy LUCA sequences: significantly less depleted in the late amino acids tryptophan and tyrosine, and enriched rather than depleted in phenylalanine. This is compatible with at least some of these sequences predating the current genetic code. Their distinct enrichment patterns thus provide hints about earlier, alternative genetic codes.
Collapse
Affiliation(s)
- Sawsan Wehbi
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, Arizona, 85721, USA
| | - Andrew Wheeler
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, Arizona, 85721, USA
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Bui Quang Minh
- School of Computing, Australian National University, Canberra, ACT, Australia
| | - Dante S Lauretta
- Lunar and Planetary Laboratory, University of Arizona, Tucson, AZ 85721, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| |
Collapse
|
6
|
Middendorf L, Eicholt LA. Random, de novo, and conserved proteins: How structure and disorder predictors perform differently. Proteins 2024; 92:757-767. [PMID: 38226524 DOI: 10.1002/prot.26652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/18/2023] [Accepted: 12/01/2023] [Indexed: 01/17/2024]
Abstract
Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| |
Collapse
|
7
|
Mouhand A, Nakatani K, Kono F, Hippo Y, Matsuo T, Barthe P, Peters J, Suenaga Y, Tamada T, Roumestand C. 1H, 13C and 15N backbone and side-chain resonance assignments of the human oncogenic protein NCYM. BIOMOLECULAR NMR ASSIGNMENTS 2024; 18:65-70. [PMID: 38526839 DOI: 10.1007/s12104-024-10169-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 03/13/2024] [Indexed: 03/27/2024]
Abstract
NCYM is a cis-antisense gene of MYCN oncogene and encodes an oncogenic protein that stabilizes MYCN via inhibition of GSK3b. High NCYM expression levels are associated with poor clinical outcomes in human neuroblastomas, and NCYM overexpression promotes distant metastasis in animal models of neuroblastoma. Using vacuum-ultraviolet circular dichroism and small-angle X-ray scattering, we previously showed that NCYM has high flexibility with partially folded structures; however, further structural characterization is required for the design of anti-cancer agents targeting NCYM. Here we report the 1H, 15N and 13C nuclear magnetic resonance assignments of NCYM. Secondary structure prediction using Secondary Chemical Shifts and TALOS-N analysis demonstrates that the structure of NCYM is essentially disordered, even though residues in the central region of the peptide clearly present a propensity to adopt a dynamic helical structure. This preliminary study provides foundations for further analysis of interaction between NCYM and potential partners.
Collapse
Affiliation(s)
- Assia Mouhand
- Centre de Biologie Structurale (CBS), CNRS, INSERM, Univ Montpellier, Montpellier, France
| | - Kazuma Nakatani
- Laboratory of Evolutionary Oncology, Chiba Cancer Center Research Institute, Chiba, Japan
- Graduate School of Medical and Pharmaceutical Sciences, Chiba University, Chiba, Japan
| | - Fumiaki Kono
- Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Yoshitaka Hippo
- Laboratory of Evolutionary Oncology, Chiba Cancer Center Research Institute, Chiba, Japan
- Laboratory of Precision Tumor Model Systems, Chiba Cancer Center Research Institute, Chiba, Japan
| | - Tatsuhito Matsuo
- Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Philippe Barthe
- Centre de Biologie Structurale (CBS), CNRS, INSERM, Univ Montpellier, Montpellier, France
| | - Judith Peters
- Institut Laue Langevin, 38042, Grenoble, France
- Université Grenoble Alpes, CNRS, LiPhy, 38400, Grenoble, France
| | - Yusuke Suenaga
- Laboratory of Evolutionary Oncology, Chiba Cancer Center Research Institute, Chiba, Japan.
| | - Taro Tamada
- Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
- Department of Quantum Life Science, Graduate School of Science, Chiba University, Chiba, Japan.
| | - Christian Roumestand
- Centre de Biologie Structurale (CBS), CNRS, INSERM, Univ Montpellier, Montpellier, France.
| |
Collapse
|
8
|
Andjus S, Szachnowski U, Vogt N, Gioftsidi S, Hatin I, Cornu D, Papadopoulos C, Lopes A, Namy O, Wery M, Morillon A. Pervasive translation of Xrn1-sensitive unstable long noncoding RNAs in yeast. RNA (NEW YORK, N.Y.) 2024; 30:662-679. [PMID: 38443115 PMCID: PMC11098462 DOI: 10.1261/rna.079903.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/15/2024] [Indexed: 03/07/2024]
Abstract
Despite being predicted to lack coding potential, cytoplasmic long noncoding (lnc)RNAs can associate with ribosomes. However, the landscape and biological relevance of lncRNA translation remain poorly studied. In yeast, cytoplasmic Xrn1-sensitive unstable transcripts (XUTs) are targeted by nonsense-mediated mRNA decay (NMD), suggesting a translation-dependent degradation process. Here, we report that XUTs are pervasively translated, which impacts their decay. We show that XUTs globally accumulate upon translation elongation inhibition, but not when initial ribosome loading is impaired. Ribo-seq confirmed ribosomes binding to XUTs and identified ribosome-associated 5'-proximal small ORFs. Mechanistically, the NMD-sensitivity of XUTs mainly depends on the 3'-untranslated region length. Finally, we show that the peptide resulting from the translation of an NMD-sensitive XUT reporter exists in NMD-competent cells. Our work highlights the role of translation in the posttranscriptional metabolism of XUTs. We propose that XUT-derived peptides could be exposed to natural selection, while NMD restricts XUT levels.
Collapse
Affiliation(s)
- Sara Andjus
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL University, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Ugo Szachnowski
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Nicolas Vogt
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Stamatia Gioftsidi
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - David Cornu
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Wery
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| |
Collapse
|
9
|
Linnenbrink M, Breton G, Misra P, Pfeifle C, Dutheil JY, Tautz D. Experimental Evaluation of a Direct Fitness Effect of the De Novo Evolved Mouse Gene Pldi. Genome Biol Evol 2024; 16:evae084. [PMID: 38742287 PMCID: PMC11091481 DOI: 10.1093/gbe/evae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/16/2024] [Indexed: 05/16/2024] Open
Abstract
De novo evolved genes emerge from random parts of noncoding sequences and have, therefore, no homologs from which a function could be inferred. While expression analysis and knockout experiments can provide insights into the function, they do not directly test whether the gene is beneficial for its carrier. Here, we have used a seminatural environment experiment to test the fitness of the previously identified de novo evolved mouse gene Pldi, which has been implicated to have a role in sperm differentiation. We used a knockout mouse strain for this gene and competed it against its parental wildtype strain for several generations of free reproduction. We found that the knockout (ko) allele frequency decreased consistently across three replicates of the experiment. Using an approximate Bayesian computation framework that simulated the data under a demographic scenario mimicking the experiment's demography, we could estimate a selection coefficient ranging between 0.21 and 0.61 for the wildtype allele compared to the ko allele in males, under various models. This implies a relatively strong selective advantage, which would fix the new gene in less than hundred generations after its emergence.
Collapse
Affiliation(s)
- Miriam Linnenbrink
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Max Planck Institute for Biological Intelligence, 82152 Martinsried, Germany
| | - Gwenna Breton
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Clinical Genomics Gothenburg, Science for Life Laboratory, Sahlgrenska Academy, University of Gothenburg, and Center for Medical Genomics, Department of Clinical Genetic and Genomics, Sahlgrenska University Hospital, Sweden
| | - Pallavi Misra
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Laboratory Corporation of America (LabCorp), Westborough, MA 01581, USA
| | - Christine Pfeifle
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Julien Y Dutheil
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| |
Collapse
|
10
|
uz-Zaman MH, D’Alton S, Barrick JE, Ochman H. Promoter recruitment drives the emergence of proto-genes in a long-term evolution experiment with Escherichia coli. PLoS Biol 2024; 22:e3002418. [PMID: 38713714 PMCID: PMC11101190 DOI: 10.1371/journal.pbio.3002418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/17/2024] [Accepted: 04/18/2024] [Indexed: 05/09/2024] Open
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli long-term evolution experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, with levels of transcription across low-expressed regions increasing in later generations of the experiment. Proto-genes formed downstream of new mutations result either from insertion element activity or chromosomal translocations that fused preexisting regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter, although such cases were rare compared to those caused by recruitment of preexisting promoters. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, can persist stably, and can serve as potential substrates for new gene formation.
Collapse
Affiliation(s)
- Md. Hassan uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Simon D’Alton
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
11
|
Chen J, Landback P, Arsala D, Guzzetta A, Xia S, Atlas J, Sosa D, Zhang YE, Cheng J, Shen B, Long M. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567139. [PMID: 38045239 PMCID: PMC10690195 DOI: 10.1101/2023.11.14.567139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are genetic novelties pivotal in mammalian evolution. Their phenotypic impacts and evolutionary pattern over time, however, remain elusive in humans due to the technical and ethical complexities in functional studies. By combining human gene age dating and Mendelian disease phenotyping, our research reveals a gradual increase in disease gene proportions with gene age. Logistic regression modeling indicates that this increase could be related to longer protein lengths and higher burdens of deleterious de novo germline variants (DNVs) for older genes. We also find a steady integration of new genes with biomedical phenotypes into the human genome over macroevolutionary timescales (~0.07% per million years). Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures across gene ages. Notably, young genes show significant enrichment in diseases related to the male reproductive system, indicating strong sexual selection. Young genes also exhibit disease-related functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, musculoskeletal phenotypes, and color vision. We further reveal a logistic growth pattern of pleiotropy over evolutionary time, indicating a diminishing marginal growth of new functions for older genes due to intensifying selective constraints over time. We propose a "pleiotropy-barrier" model that delineates higher potentials of phenotypic innovation for young genes than for older genes, a process subject to natural selection. Our study demonstrates that evolutionary new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
Affiliation(s)
- Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Patrick Landback
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Alexander Guzzetta
- Department of Pathology, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Jared Atlas
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingqiu Cheng
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| |
Collapse
|
12
|
Cuevas-Zuviría B, Garcia AK, Rivier AJ, Rucker HR, Carruthers BM, Kaçar B. Emergence of an Orphan Nitrogenase Protein Following Atmospheric Oxygenation. Mol Biol Evol 2024; 41:msae067. [PMID: 38526235 PMCID: PMC11018506 DOI: 10.1093/molbev/msae067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 03/06/2024] [Accepted: 03/19/2024] [Indexed: 03/26/2024] Open
Abstract
Molecular innovations within key metabolisms can have profound impacts on element cycling and ecological distribution. Yet, much of the molecular foundations of early evolved enzymes and metabolisms are unknown. Here, we bring one such mystery to relief by probing the birth and evolution of the G-subunit protein, an integral component of certain members of the nitrogenase family, the only enzymes capable of biological nitrogen fixation. The G-subunit is a Paleoproterozoic-age orphan protein that appears more than 1 billion years after the origin of nitrogenases. We show that the G-subunit arose with novel nitrogenase metal dependence and the ecological expansion of nitrogen-fixing microbes following the transition in environmental metal availabilities and atmospheric oxygenation that began ∼2.5 billion years ago. We identify molecular features that suggest early G-subunit proteins mediated cofactor or protein interactions required for novel metal dependency, priming ancient nitrogenases and their hosts to exploit these newly diversified geochemical environments. We further examined the degree of functional specialization in G-subunit evolution with extant and ancestral homologs using laboratory reconstruction experiments. Our results indicate that permanent recruitment of the orphan protein depended on the prior establishment of conserved molecular features and showcase how contingent evolutionary novelties might shape ecologically important microbial innovations.
Collapse
Affiliation(s)
| | - Amanda K Garcia
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Alex J Rivier
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Holly R Rucker
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Brooke M Carruthers
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Betül Kaçar
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
13
|
Aubel M, Buchel F, Heames B, Jones A, Honc O, Bornberg-Bauer E, Hlouchova K. High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential. Genome Biol Evol 2024; 16:evae069. [PMID: 38597156 PMCID: PMC11024478 DOI: 10.1093/gbe/evae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/23/2024] [Indexed: 04/11/2024] Open
Abstract
De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Filip Buchel
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Department of Biochemistry, Faculty of Science, Charles University, Prague, Czech Republic
| | - Brennen Heames
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Alun Jones
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Ondrej Honc
- Imaging Methods Core Facility, BIOCEV, Prague, Czech Republic
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
- Department of Protein Evolution, Max Planck-Institute for Biology Tuebingen, Tuebingen, Germany
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
| |
Collapse
|
14
|
Delihas N. Evolution of a Human-Specific De Novo Open Reading Frame and Its Linked Transcriptional Silencer. Int J Mol Sci 2024; 25:3924. [PMID: 38612733 PMCID: PMC11011693 DOI: 10.3390/ijms25073924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/23/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024] Open
Abstract
In the human genome, two short open reading frames (ORFs) separated by a transcriptional silencer and a small intervening sequence stem from the gene SMIM45. The two ORFs show different translational characteristics, and they also show divergent patterns of evolutionary development. The studies presented here describe the evolution of the components of SMIM45. One ORF consists of an ultra-conserved 68 amino acid (aa) sequence, whose origins can be traced beyond the evolutionary age of divergence of the elephant shark, ~462 MYA. The silencer also has ancient origins, but it has a complex and divergent pattern of evolutionary formation, as it overlaps both at the 68 aa ORF and the intervening sequence. The other ORF consists of 107 aa. It develops during primate evolution but is found to originate de novo from an ancestral non-coding genomic region with root origins within the Afrothere clade of placental mammals, whose evolutionary age of divergence is ~99 MYA. The formation of the complete 107 aa ORF during primate evolution is outlined, whereby sequence development is found to occur through biased mutations, with disruptive random mutations that also occur but lead to a dead-end. The 107 aa ORF is of particular significance, as there is evidence to suggest it is a protein that may function in human brain development. Its evolutionary formation presents a view of a human-specific ORF and its linked silencer that were predetermined in non-primate ancestral species. The genomic position of the silencer offers interesting possibilities for the regulation of transcription of the 107 aa ORF. A hypothesis is presented with respect to possible spatiotemporal expression of the 107 aa ORF in embryonic tissues.
Collapse
Affiliation(s)
- Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
15
|
Fleck K, Luria V, Garag N, Karger A, Hunter T, Marten D, Phu W, Nam KM, Sestan N, O’Donnell-Luria AH, Erceg J. Functional associations of evolutionarily recent human genes exhibit sensitivity to the 3D genome landscape and disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.17.585403. [PMID: 38559085 PMCID: PMC10980080 DOI: 10.1101/2024.03.17.585403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genome organization is intricately tied to regulating genes and associated cell fate decisions. In this study, we examine the positioning and functional significance of human genes, grouped by their evolutionary age, within the 3D organization of the genome. We reveal that genes of different evolutionary origin have distinct positioning relationships with both domains and loop anchors, and remarkably consistent relationships with boundaries across cell types. While the functional associations of each group of genes are primarily cell type-specific, such associations of conserved genes maintain greater stability across 3D genomic features and disease than recently evolved genes. Furthermore, the expression of these genes across various tissues follows an evolutionary progression, such that RNA levels increase from young genes to ancient genes. Thus, the distinct relationships of gene evolutionary age, function, and positioning within 3D genomic features contribute to tissue-specific gene regulation in development and disease.
Collapse
Affiliation(s)
- Katherine Fleck
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269
| | - Victor Luria
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| | - Nitanta Garag
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
| | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA 02115
| | - Trevor Hunter
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
| | - Daniel Marten
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
| | - William Phu
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
| | - Kee-Myoung Nam
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06510
| | - Nenad Sestan
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510
| | - Anne H. O’Donnell-Luria
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Jelena Erceg
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030
| |
Collapse
|
16
|
Ye W, Krishna Behra PR, Dyrhage K, Seeger C, Joiner JD, Karlsson E, Andersson E, Chi CN, Andersson SGE, Jemth P. Folded Alpha Helical Putative New Proteins from Apilactobacillus kunkeei. J Mol Biol 2024; 436:168490. [PMID: 38355092 DOI: 10.1016/j.jmb.2024.168490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 02/16/2024]
Abstract
The emergence of new proteins is a central question in biology. Most tertiary protein folds known to date appear to have an ancient origin, but it is clear from bioinformatic analyses that new proteins continuously emerge in all organismal groups. However, there is a paucity of experimental data on new proteins regarding their structure and biophysical properties. We performed a detailed phylogenetic analysis and identified 48 putative open reading frames in the honeybee-associated bacterium Apilactobacillus kunkeei for which no or few homologs could be identified in closely-related species, suggesting that they could be relatively new on an evolutionary time scale and represent recently evolved proteins. Using circular dichroism-, fluorescence- and nuclear magnetic resonance (NMR) spectroscopy we investigated six of these proteins and show that they are not intrinsically disordered, but populate alpha-helical dominated folded states with relatively low thermodynamic stability (0-3 kcal/mol). The NMR and biophysical data demonstrate that small new proteins readily adopt simple folded conformations suggesting that more complex tertiary structures can be continuously re-invented during evolution by fusion of such simple secondary structure elements. These findings have implications for the general view on protein evolution, where de novo emergence of folded proteins may be a common event.
Collapse
Affiliation(s)
- Weihua Ye
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Phani Rama Krishna Behra
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden
| | - Karl Dyrhage
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden
| | - Christian Seeger
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden
| | - Joe D Joiner
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Elin Karlsson
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Eva Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Celestine N Chi
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden.
| | - Siv G E Andersson
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden.
| | - Per Jemth
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden.
| |
Collapse
|
17
|
Rives N, Lamba V, Christina Cheng CH, Zhuang X. Diverse origins of near-identical antifreeze proteins in unrelated fish lineages provide insights into evolutionary mechanisms of new gene birth and protein sequence convergence. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584730. [PMID: 38559027 PMCID: PMC10980009 DOI: 10.1101/2024.03.12.584730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Determining the origins of novel genes and the genetic mechanisms underlying the emergence of new functions is challenging yet crucial for understanding evolutionary innovations. The novel fish antifreeze proteins, exemplifying convergent evolution, represent excellent opportunities to investigate the evolutionary origins and pathways of new genes. Particularly notable is the near-identical type I antifreeze proteins (AFPI) in four phylogenetically divergent fish taxa. This study tested the hypothesis of protein sequence convergence beyond functional convergence in three unrelated AFPI-bearing fish lineages, revealing different paths by which a similar protein arose from diverse genomic resources. Comprehensive comparative analyses of de novo sequenced genome of the winter flounder and grubby sculpin, available high-quality genome of the cunner, and those of 14 other relevant species found that the near-identical AFPI originated from a distinct genetic precursor in each lineage, and independently evolved coding regions for the novel ice-binding protein while retaining sequence identity in the regulatory regions with their respective ancestor. The deduced evolutionary processes and molecular mechanisms is consistent with the Innovation-Amplification-Divergence (IAD) model applicable to AFPI formation in all three lineages, a new Duplication-Degeneration-Divergence (DDD) model we propose for the sculpin lineage, and a DDD model with gene fission for the cunner lineage. This investigation illustrates the multiple ways by which a novel functional gene with sequence convergence at the protein level could evolve across divergent species, advancing our understanding of the mechanistic intricacies in new gene formation.
Collapse
Affiliation(s)
- Nathan Rives
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Vinita Lamba
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - C.-H. Christina Cheng
- Department of Evolution, Ecology and Behavior, University of Illinois, Urbana-Champaign, IL, USA
| | - Xuan Zhuang
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| |
Collapse
|
18
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
19
|
Luthra I, Jensen C, Chen XE, Salaudeen AL, Rafi AM, de Boer CG. Regulatory activity is the default DNA state in eukaryotes. Nat Struct Mol Biol 2024; 31:559-567. [PMID: 38448573 DOI: 10.1038/s41594-024-01235-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.
Collapse
Affiliation(s)
- Ishika Luthra
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cassandra Jensen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xinyi E Chen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Asfar Lathif Salaudeen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Abdul Muntakim Rafi
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
20
|
Dayi M. Evolution of parasitism genes in the plant parasitic nematodes. Sci Rep 2024; 14:3733. [PMID: 38355886 PMCID: PMC10866927 DOI: 10.1038/s41598-024-54330-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 02/11/2024] [Indexed: 02/16/2024] Open
Abstract
The plant-parasitic nematodes are considered as one of the most destructive pests, from which the migratory and sedentary endoparasitic plant parasitic nematodes infect more than 4000 plant species and cause over $100 billion crop losses annually worldwide. These nematodes use multiple strategies to infect their host and to establish a successful parasitism inside the host such as cell-wall degradation enzymes, inhibition of host defense proteins, and molecular mimicry. In the present study, the main parasitism-associated gene families were identified and compared between the migratory and sedentary endoparasitic nematodes. The results showed that the migratory and sedentary endoparasitic nematodes share a core conserved parasitism mechanism established throughout the evolution of parasitism. However, genes involved in pectin degradation and hydrolase activity are rapidly evolving in the migratory endoparasitic nematodes. Additionally, cell-wall degrading enzymes such as GH45 cellulases and pectate lyase and peptidase and peptidase inhibitors were expanded in the migratory endoparasitic nematodes. The molecular mimicry mechanism was another key finding that differs between the endoparasitic and sedentary parasitic nematodes. The PL22 gene family, which is believed to play a significant role in the molecular mechanisms of nematode parasitism, has been found to be present exclusively in migratory endoparasitic nematodes. Phylogenetic analysis has suggested that it was de novo born in these nematodes. This discovery sheds new light on the molecular evolution of these parasites and has significant implications for our understanding of their biology and pathogenicity. This study contributes to our understanding of core parasitism mechanisms conserved throughout the nematodes and provides unique clues on the evolution of parasitism and the direction shaped by the host.
Collapse
Affiliation(s)
- Mehmet Dayi
- Forestry Vocational School, Düzce University, Konuralp Campus, 81620, Düzce, Turkey.
- Faculty of Medicine, University of Miyazaki, Miyazaki, Japan.
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, 277-8562, Japan.
| |
Collapse
|
21
|
Pires JF, Grattão CC, Gomes RMR. The challenges for early intervention and its effects on the prognosis of autism spectrum disorder: a systematic review. Dement Neuropsychol 2024; 18:e20230034. [PMID: 38425700 PMCID: PMC10901562 DOI: 10.1590/1980-5764-dn-2023-0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 10/04/2023] [Accepted: 11/22/2023] [Indexed: 03/02/2024] Open
Abstract
Autism spectrum disorder (ASD) is expressed with neurobehavioral symptoms of different degrees of intensity. It is estimated that, for every three cases detected, there are two cases that reach adulthood without treatment. Objective To establish what challenges are still present in the implementation of early intervention (EI) and its effects on the prognosis of ASD. Methods A systematic review using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (Prisma) methodology was carried out in the PubMed and ScienceDirect databases in January 2023. The search keywords were "autism spectrum disorder", "early intervention" and "prognosis". Results Sixteen studies were included, two randomized and 14 non-randomized. Knowledge about the signs of ASD, diagnostic and therapeutic methods, age at the start of treatment, and socioeconomic factors were the main challenges encountered in the implementation of the EI. Conclusion EI is capable of modifying the prognosis of ASD and challenges in its implementation persist, especially in developing regions with low socioeconomic status.
Collapse
|
22
|
Hannon Bozorgmehr J. Four classic "de novo" genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences. Mol Genet Genomics 2024; 299:6. [PMID: 38315248 DOI: 10.1007/s00438-023-02090-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 10/15/2023] [Indexed: 02/07/2024]
Abstract
Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-coding genes can emerge from non-coding sequences has gradually become accepted over the past two decades. Examples of "de novo origination", resulting in lineage-specific "orphan" genes, lacking coding orthologs, are now produced every year. However, many are likely cases of duplicates that are difficult to recognize. Here, I re-examine the claims and show that four very well-known examples of genes alleged to have emerged completely "from scratch"- FLJ33706 in humans, Goddard in fruit flies, BSC4 in baker's yeast and AFGP2 in codfish-may have plausible evolutionary ancestors in pre-existing genes. The first two are likely highly diverged retrogenes coding for regulatory proteins that have been misidentified as orphans. The antifreeze glycoprotein, moreover, may not have evolved from repetitive non-genic sequences but, as in several other related cases, from an apolipoprotein that could have become pseudogenized before later being reactivated. These findings detract from various claims made about de novo gene birth and show there has been a tendency not to invest the necessary effort in searching for homologs outside of a very limited syntenic or phylostratigraphic methodology. A robust approach is used for improving detection that draws upon similarities, not just in terms of statistical sequence analysis, but also relating to biochemistry and function, to obviate notable failures to identify homologs.
Collapse
|
23
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun 2024; 15:810. [PMID: 38280868 PMCID: PMC10821953 DOI: 10.1038/s41467-024-45028-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/09/2024] [Indexed: 01/29/2024] Open
Abstract
Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation. These young genes provide an opportunity to study the structural and functional origins of proteins. Here, we combine high-quality base-level whole-genome alignments and computational structural modeling to study the origination, evolution, and protein structures of lineage-specific de novo genes. We identify 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. Sequence composition, evolutionary rates, and expression patterns indicate possible gradual functional or adaptive shifts with their gene ages. Surprisingly, we find little overall protein structural changes in candidates from the Drosophilinae lineage. We identify several candidates with potentially well-folded protein structures. Ancestral sequence reconstruction analysis reveals that most potentially well-folded candidates are often born well-folded. Single-cell RNA-seq analysis in testis shows that although most de novo gene candidates are enriched in spermatocytes, several young candidates are biased towards the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and protein structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
24
|
Oggenfuss U, Badet T, Croll D. A systematic screen for co-option of transposable elements across the fungal kingdom. Mob DNA 2024; 15:2. [PMID: 38245743 PMCID: PMC10799480 DOI: 10.1186/s13100-024-00312-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 01/04/2024] [Indexed: 01/22/2024] Open
Abstract
How novel protein functions are acquired is a central question in molecular biology. Key paths to novelty include gene duplications, recombination or horizontal acquisition. Transposable elements (TEs) are increasingly recognized as a major source of novel domain-encoding sequences. However, the impact of TE coding sequences on the evolution of the proteome remains understudied. Here, we analyzed 1237 genomes spanning the phylogenetic breadth of the fungal kingdom. We scanned proteomes for evidence of co-occurrence of TE-derived domains along with other conventional protein functional domains. We detected more than 13,000 predicted proteins containing potentially TE-derived domain, of which 825 were identified in more than five genomes, indicating that many host-TE fusions may have persisted over long evolutionary time scales. We used the phylogenetic context to identify the origin and retention of individual TE-derived domains. The most common TE-derived domains are helicases derived from Academ, Kolobok or Helitron. We found putative TE co-options at a higher rate in genomes of the Saccharomycotina, providing an unexpected source of protein novelty in these generally TE depleted genomes. We investigated in detail a candidate host-TE fusion with a heterochromatic transcriptional silencing function that may play a role in TE and gene regulation in ascomycetes. The affected gene underwent multiple full or partial losses within the phylum. Overall, our work establishes a kingdom-wide view of putative host-TE fusions and facilitates systematic investigations of candidate fusion proteins.
Collapse
Affiliation(s)
- Ursula Oggenfuss
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, CH-2000, Neuchâtel, Switzerland
- Department of Microbiology and Immunology, University of Minnesota, Medical School, Minneapolis, Minnesota, United States of America
| | - Thomas Badet
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, CH-2000, Neuchâtel, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, CH-2000, Neuchâtel, Switzerland.
| |
Collapse
|
25
|
Grandchamp A, Czuppon P, Bornberg-Bauer E. Quantification and modeling of turnover dynamics of de novo transcripts in Drosophila melanogaster. Nucleic Acids Res 2024; 52:274-287. [PMID: 38000384 PMCID: PMC10783523 DOI: 10.1093/nar/gkad1079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/13/2023] [Accepted: 10/28/2023] [Indexed: 11/26/2023] Open
Abstract
Most of the transcribed eukaryotic genomes are composed of non-coding transcripts. Among these transcripts, some are newly transcribed when compared to outgroups and are referred to as de novo transcripts. De novo transcripts have been shown to play a major role in genomic innovations. However, little is known about the rates at which de novo transcripts are gained and lost in individuals of the same species. Here, we address this gap and estimate the de novo transcript turnover rate with an evolutionary model. We use DNA long reads and RNA short reads from seven geographically remote samples of inbred individuals of Drosophila melanogaster to detect de novo transcripts that are gained on a short evolutionary time scale. Overall, each sampled individual contains around 2500 unspliced de novo transcripts, with most of them being sample specific. We estimate that around 0.15 transcripts are gained per year, and that each gained transcript is lost at a rate around 5× 10-5 per year. This high turnover of transcripts suggests frequent exploration of new genomic sequences within species. These rate estimates are essential to comprehend the process and timescale of de novo gene birth.
Collapse
Affiliation(s)
- Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Peter Czuppon
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
26
|
Mönttinen HAM, Frilander MJ, Löytynoja A. Generation of de novo miRNAs from template switching during DNA replication. Proc Natl Acad Sci U S A 2023; 120:e2310752120. [PMID: 38019864 PMCID: PMC10710096 DOI: 10.1073/pnas.2310752120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/01/2023] [Indexed: 12/01/2023] Open
Abstract
The mechanisms generating novel genes and genetic information are poorly known, even for microRNA (miRNA) genes with an extremely constrained design. All miRNA primary transcripts need to fold into a stem-loop structure to yield short gene products ([Formula: see text]22 nt) that bind and repress their mRNA targets. While a substantial number of miRNA genes are ancient and highly conserved, short secondary structures coding for entirely novel miRNA genes have been shown to emerge in a lineage-specific manner. Template switching is a DNA-replication-related mutation mechanism that can introduce complex changes and generate perfect base pairing for entire hairpin structures in a single event. Here, we show that the template-switching mutations (TSMs) have participated in the emergence of over 6,000 suitable hairpin structures in the primate lineage to yield at least 18 new human miRNA genes, that is 26% of the miRNAs inferred to have arisen since the origin of primates. While the mechanism appears random, the TSM-generated miRNAs are enriched in introns where they can be expressed with their host genes. The high frequency of TSM events provides raw material for evolution. Being orders of magnitude faster than other mechanisms proposed for de novo creation of genes, TSM-generated miRNAs enable near-instant rewiring of genetic information and rapid adaptation to changing environments.
Collapse
Affiliation(s)
- Heli A. M. Mönttinen
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Mikko J. Frilander
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| |
Collapse
|
27
|
Kore H, Datta KK, Nagaraj SH, Gowda H. Protein-coding potential of non-canonical open reading frames in human transcriptome. Biochem Biophys Res Commun 2023; 684:149040. [PMID: 37897910 DOI: 10.1016/j.bbrc.2023.09.068] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/09/2023] [Accepted: 09/23/2023] [Indexed: 10/30/2023]
Abstract
In recent years, proteogenomics and ribosome profiling studies have identified a large number of proteins encoded by noncoding regions in the human genome. They are encoded by small open reading frames (sORFs) in the untranslated regions (UTRs) of mRNAs and long non-coding RNAs (lncRNAs). These sORF encoded proteins (SEPs) are often <150AA and show poor evolutionary conservation. A subset of them have been functionally characterized and shown to play an important role in fundamental biological processes including cardiac and muscle function, DNA repair, embryonic development and various human diseases. How many novel protein-coding regions exist in the human genome and what fraction of them are functionally important remains a mystery. In this review, we discuss current progress in unraveling SEPs, approaches used for their identification, their limitations and reliability of these identifications. We also discuss functionally characterized SEPs and their involvement in various biological processes and diseases. Lastly, we provide insights into their distinctive features compared to canonical proteins and challenges associated with annotating these in protein reference databases.
Collapse
Affiliation(s)
- Hitesh Kore
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia.
| | - Keshava K Datta
- Proteomics and Metabolomics Platform, La Trobe University, Melbourne, VIC, 3083, Australia
| | - Shivashankar H Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia
| | - Harsha Gowda
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Medicine, The University of Queensland, Queensland, 4072, Australia.
| |
Collapse
|
28
|
Fakhar AZ, Liu J, Pajerowska-Mukhtar KM, Mukhtar MS. The ORFans' tale: new insights in plant biology. TRENDS IN PLANT SCIENCE 2023; 28:1379-1390. [PMID: 37453923 DOI: 10.1016/j.tplants.2023.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 05/17/2023] [Accepted: 06/19/2023] [Indexed: 07/18/2023]
Abstract
Orphan genes (OGs) are protein-coding genes without a significant sequence similarity in closely related species. Despite their functional importance, very little is known about the underlying molecular mechanisms by which OGs participate in diverse biological processes. Here, we discuss the evolutionary mechanisms of OGs' emergence with relevance to species-specific adaptations. We also provide a mechanistic view of the involvement of OGs in multiple processes, including growth, development, reproduction, and carbon-metabolism-mediated immunity. We highlight the interconnection between OGs and the sucrose nonfermenting 1 (SNF1)-related protein kinases (SnRKs)-target of rapamycin (TOR) signaling axis for phytohormone signaling, nutrient metabolism, and stress responses. Finally, we propose a high-throughput pipeline for OGs' interspecies and intraspecies gene transfer through a transgenic approach for future biotechnological advances.
Collapse
Affiliation(s)
- Ali Zeeshan Fakhar
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294, USA
| | - Jinbao Liu
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294, USA
| | | | - M Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294, USA.
| |
Collapse
|
29
|
Haltom J, Trovao NS, Guarnieri J, Vincent P, Singh U, Tsoy S, O'Leary CA, Bram Y, Widjaja GA, Cen Z, Meller R, Baylin SB, Moss WN, Nikolau BJ, Enguita FJ, Wallace DC, Beheshti A, Schwartz R, Wurtele ES. SARS-CoV-2 Orphan Gene ORF10 Contributes to More Severe COVID-19 Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.27.23298847. [PMID: 38076862 PMCID: PMC10705665 DOI: 10.1101/2023.11.27.23298847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
The orphan gene of SARS-CoV-2, ORF10, is the least studied gene in the virus responsible for the COVID-19 pandemic. Recent experimentation indicated ORF10 expression moderates innate immunity in vitro. However, whether ORF10 affects COVID-19 in humans remained unknown. We determine that the ORF10 sequence is identical to the Wuhan-Hu-1 ancestral haplotype in 95% of genomes across five variants of concern (VOC). Four ORF10 variants are associated with less virulent clinical outcomes in the human host: three of these affect ORF10 protein structure, one affects ORF10 RNA structural dynamics. RNA-Seq data from 2070 samples from diverse human cells and tissues reveals ORF10 accumulation is conditionally discordant from that of other SARS-CoV-2 transcripts. Expression of ORF10 in A549 and HEK293 cells perturbs immune-related gene expression networks, alters expression of the majority of mitochondrially-encoded genes of oxidative respiration, and leads to large shifts in levels of 14 newly-identified transcripts. We conclude ORF10 contributes to more severe COVID-19 clinical outcomes in the human host.
Collapse
Affiliation(s)
- Jeffrey Haltom
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| | - Nidia S Trovao
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, 20892, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| | - Joseph Guarnieri
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| | - Pan Vincent
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Urminder Singh
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
| | - Sergey Tsoy
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Collin A O'Leary
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Yaron Bram
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Gabrielle A Widjaja
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Zimu Cen
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Robert Meller
- Morehouse School of Medicine, Atlanta, GA , 30310-1495, USA
| | - Stephen B Baylin
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21231
- Van Andel Research Institute, Grand Rapids, MI 49503
| | - Walter N Moss
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Basil J Nikolau
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Francisco J Enguita
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| | - Douglas C Wallace
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Afshin Beheshti
- COVID-19 International Research Team, Medford, MA 02155, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Blue Marble Space Institute of Science, Seattle, WA, 98104 USA
| | - Robert Schwartz
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- Department of Biomedical Engineering, Cornell University, Ithaca, NY, USA
| | - Eve Syrkin Wurtele
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| |
Collapse
|
30
|
Chen J. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. RESEARCH SQUARE 2023:rs.3.rs-3632644. [PMID: 38045389 PMCID: PMC10690325 DOI: 10.21203/rs.3.rs-3632644/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are structural novelties pivotal in mammalian evolution. Their phenotypic impact on humans, however, remains elusive due to the technical and ethical complexities in functional studies. Through combining gene age dating with Mendelian disease phenotyping, our research reveals that new genes associated with disease phenotypes steadily integrate into the human genome at a rate of ~ 0.07% every million years over macroevolutionary timescales. Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures between young and old genes. Notably, young genes show significant enrichment in the male reproductive system, indicating strong sexual selection. Young genes also exhibit functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, bipedal locomotion, and color vision. Our findings further reveal increasing levels of pleiotropy over evolutionary time, which accompanies stronger selective constraints. We propose a "pleiotropy-barrier" model that delineates different potentials for phenotypic innovation between young and older genes subject to natural selection. Our study demonstrates that evolutionary new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
|
31
|
Uz-Zaman MH, D'Alton S, Barrick JE, Ochman H. Promoter capture drives the emergence of proto-genes in Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567300. [PMID: 38013999 PMCID: PMC10680751 DOI: 10.1101/2023.11.15.567300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli Long-Term Evolution Experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time-span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, thereby serving as raw material for new gene emergence. Most proto-genes result either from insertion element activity or chromosomal translocations that fused pre-existing regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, persist stably, and can serve as potential substrates for new gene formation.
Collapse
|
32
|
Mani S, Tlusty T. Gene birth in a model of non-genic adaptation. BMC Biol 2023; 21:257. [PMID: 37957718 PMCID: PMC10644530 DOI: 10.1186/s12915-023-01745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 10/24/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Over evolutionary timescales, genomic loci can switch between functional and non-functional states through processes such as pseudogenization and de novo gene birth. Particularly, de novo gene birth is a widespread process, and many examples continue to be discovered across diverse evolutionary lineages. However, the general mechanisms that lead to functionalization are poorly understood, and estimated rates of de novo gene birth remain contentious. Here, we address this problem within a model that takes into account mutations and structural variation, allowing us to estimate the likelihood of emergence of new functions at non-functional loci. RESULTS Assuming biologically reasonable mutation rates and mutational effects, we find that functionalization of non-genic loci requires the realization of strict conditions. This is in line with the observation that most de novo genes are localized to the vicinity of established genes. Our model also provides an explanation for the empirical observation that emerging proto-genes are often lost despite showing signs of adaptation. CONCLUSIONS Our work elucidates the properties of non-genic loci that make them fertile for adaptation, and our results offer mechanistic insights into the process of de novo gene birth.
Collapse
Affiliation(s)
- Somya Mani
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea.
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
- Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| |
Collapse
|
33
|
Yocca AE, Platts A, Alger E, Teresi S, Mengist MF, Benevenuto J, Ferrão LFV, Jacobs M, Babinski M, Magallanes-Lundback M, Bayer P, Golicz A, Humann JL, Main D, Espley RV, Chagné D, Albert NW, Montanari S, Vorsa N, Polashock J, Díaz-Garcia L, Zalapa J, Bassil NV, Munoz PR, Iorizzo M, Edger PP. Blueberry and cranberry pangenomes as a resource for future genetic studies and breeding efforts. HORTICULTURE RESEARCH 2023; 10:uhad202. [PMID: 38023484 PMCID: PMC10673653 DOI: 10.1093/hr/uhad202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/01/2023] [Indexed: 12/01/2023]
Abstract
Domestication of cranberry and blueberry began in the United States in the early 1800s and 1900s, respectively, and in part owing to their flavors and health-promoting benefits are now cultivated and consumed worldwide. The industry continues to face a wide variety of production challenges (e.g. disease pressures), as well as a demand for higher-yielding cultivars with improved fruit quality characteristics. Unfortunately, molecular tools to help guide breeding efforts for these species have been relatively limited compared with those for other high-value crops. Here, we describe the construction and analysis of the first pangenome for both blueberry and cranberry. Our analysis of these pangenomes revealed both crops exhibit great genetic diversity, including the presence-absence variation of 48.4% genes in highbush blueberry and 47.0% genes in cranberry. Auxiliary genes, those not shared by all cultivars, are significantly enriched with molecular functions associated with disease resistance and the biosynthesis of specialized metabolites, including compounds previously associated with improving fruit quality traits. The discovery of thousands of genes, not present in the previous reference genomes for blueberry and cranberry, will serve as the basis of future research and as potential targets for future breeding efforts. The pangenome, as a multiple-sequence alignment, as well as individual annotated genomes, are publicly available for analysis on the Genome Database for Vaccinium-a curated and integrated web-based relational database. Lastly, the core-gene predictions from the pangenomes will serve useful to develop a community genotyping platform to guide future molecular breeding efforts across the family.
Collapse
Affiliation(s)
- Alan E Yocca
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, United States
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, United States
| | - Adrian Platts
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, United States
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, United States
| | - Elizabeth Alger
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, United States
| | - Scott Teresi
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, United States
- Genetics and Genome Sciences, Michigan State University, East Lansing, MI, 48824, United States
| | - Molla F Mengist
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC United States
| | - Juliana Benevenuto
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, United States
| | - Luis Felipe V Ferrão
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, United States
| | - MacKenzie Jacobs
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, United States
| | - Michal Babinski
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, United States
| | | | - Philipp Bayer
- University of Western Australia, Perth 6009Australia
| | | | - Jodi L Humann
- Department of Horticulture, Washington State University, Pullman, WA, 99163, United States
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, 99163, United States
| | - Richard V Espley
- The New Zealand Institute for Plant and Food Research Limited (PFR), Auckland, New Zealand
| | - David Chagné
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston, New Zealand
| | - Nick W Albert
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston, New Zealand
| | - Sara Montanari
- The New Zealand Institute for Plant and Food Research Limited (PFR), Motueka, New Zealand
| | - Nicholi Vorsa
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019United States
| | - James Polashock
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019United States
| | - Luis Díaz-Garcia
- Department of Viticulture and Enology, University of California, Davis, Davis, CA 95616, United States
| | - Juan Zalapa
- Department of Viticulture and Enology, University of California, Davis, Davis, CA 95616, United States
| | - Nahla V Bassil
- National Clonal Germplasm Repository, USDA-ARS, Corvallis, OR 97333, United States
| | - Patricio R Munoz
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, United States
| | - Massimo Iorizzo
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NCUnited States
- Department of Horticulture, North Carolina State University, Kannapolis, NCUnited States
| | - Patrick P Edger
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, United States
- Genetics and Genome Sciences, Michigan State University, East Lansing, MI, 48824, United States
- MSU AgBioResearch, Michigan State University, East Lansing, MI, 48824, United States
| |
Collapse
|
34
|
Zhou X, Peng T, Zeng Y, Cai Y, Zuo Q, Zhang L, Dong S, Liu Y. Chromosome-level genome assembly of Niphotrichum japonicum provides new insights into heat stress responses in mosses. FRONTIERS IN PLANT SCIENCE 2023; 14:1271357. [PMID: 37920716 PMCID: PMC10619864 DOI: 10.3389/fpls.2023.1271357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/25/2023] [Indexed: 11/04/2023]
Abstract
With a diversity of approximately 22,000 species, bryophytes (hornworts, liverworts, and mosses) represent a major and diverse lineage of land plants. Bryophytes can thrive in many extreme environments as they can endure the stresses of drought, heat, and cold. The moss Niphotrichum japonicum (Grimmiaceae, Grimmiales) can subsist for extended periods under heat and drought conditions, providing a good candidate for studying the genetic basis underlying such high resilience. Here, we de novo assembled the genome of N. japonicum using Nanopore long reads combined with Hi-C scaffolding technology to anchor the 191.61 Mb assembly into 14 pseudochromosomes. The genome structure of N. japonicum's autosomes is mostly conserved and highly syntenic, in contrast to the sparse and disordered genes present in its sex chromosome. Comparative genomic analysis revealed the presence of 10,019 genes exclusively in N. japonicum. These genes may contribute to the species-specific resilience, as demonstrated by the gene ontology (GO) enrichment. Transcriptome analysis showed that 37.44% (including 3,107 unique genes) of the total annotated genes (26,898) exhibited differential expression as a result of heat-induced stress, and the mechanisms that respond to heat stress are generally conserved across plants. These include the upregulation of HSPs, LEAs, and reactive oxygen species (ROS) scavenging genes, and the downregulation of PPR genes. N. japonicum also appears to have distinctive thermal mechanisms, including species-specific expansion and upregulation of the Self-incomp_S1 gene family, functional divergence of duplicated genes, structural clusters of upregulated genes, and expression piggybacking of hub genes. Overall, our study highlights both shared and species-specific heat tolerance strategies in N. japonicum, providing valuable insights into the heat tolerance mechanism and the evolution of resilient plants.
Collapse
Affiliation(s)
- Xuping Zhou
- Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen & Chinese Academy of Sciences, Shenzhen, China
- Colleage of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Tao Peng
- Colleage of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Yuying Zeng
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yuqing Cai
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qin Zuo
- Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen & Chinese Academy of Sciences, Shenzhen, China
| | - Li Zhang
- Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen & Chinese Academy of Sciences, Shenzhen, China
| | - Shanshan Dong
- Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen & Chinese Academy of Sciences, Shenzhen, China
| | - Yang Liu
- Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen & Chinese Academy of Sciences, Shenzhen, China
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, China
| |
Collapse
|
35
|
Wang Z, Wang YW, Kasuga T, Hassler H, Lopez-Giraldez F, Dong C, Yarden O, Townsend JP. Origins of lineage-specific elements via gene duplication, relocation, and regional rearrangement in Neurospora crassa. Mol Ecol 2023. [PMID: 37843462 DOI: 10.1111/mec.17168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/20/2023] [Accepted: 09/27/2023] [Indexed: 10/17/2023]
Abstract
The origin of new genes has long been a central interest of evolutionary biologists. However, their novelty means that they evade reconstruction by the classical tools of evolutionary modelling. This evasion of deep ancestral investigation necessitates intensive study of model species within well-sampled, recently diversified, clades. One such clade is the model genus Neurospora, members of which lack recent gene duplications. Several Neurospora species are comprehensively characterized organisms apt for studying the evolution of lineage-specific genes (LSGs). Using gene synteny, we documented that 78% of Neurospora LSG clusters are located adjacent to the telomeres featuring extensive tracts of non-coding DNA and duplicated genes. Here, we report several instances of LSGs that are likely from regional rearrangements and potentially from gene rebirth. To broadly investigate the functions of LSGs, we assembled transcriptomics data from 68 experimental data points and identified co-regulatory modules using Weighted Gene Correlation Network Analysis, revealing that LSGs are widely but peripherally involved in known regulatory machinery for diverse functions. The ancestral status of the LSG mas-1, a gene with roles in cell-wall integrity and cellular sensitivity to antifungal toxins, was investigated in detail alongside its genomic neighbours, indicating that it arose from an ancient lysophospholipase precursor that is ubiquitous in lineages of the Sordariomycetes. Our discoveries illuminate a "rummage region" in the N. crassa genome that enables the formation of new genes and functions to arise via gene duplication and relocation, followed by fast mutation and recombination facilitated by sequence repeats and unconstrained non-coding sequences.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Takao Kasuga
- College of Biological Sciences, University of California, Davis, Davis, California, USA
| | - Hayley Hassler
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | | | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
36
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
37
|
González D, Morales-Olavarria M, Vidal-Veuthey B, Cárdenas JP. Insights into early evolutionary adaptations of the Akkermansia genus to the vertebrate gut. Front Microbiol 2023; 14:1238580. [PMID: 37779688 PMCID: PMC10540074 DOI: 10.3389/fmicb.2023.1238580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/21/2023] [Indexed: 10/03/2023] Open
Abstract
Akkermansia, a relevant mucin degrader from the vertebrate gut microbiota, is a member of the deeply branched Verrucomicrobiota, as well as the only known member of this phylum to be described as inhabitants of the gut. Only a few Akkermansia species have been officially described so far, although there is genomic evidence addressing the existence of more species-level variants for this genus. This niche specialization makes Akkermansia an interesting model for studying the evolution of microorganisms to their adaptation to the gastrointestinal tract environment, including which kind of functions were gained when the Akkermansia genus originated or how the evolutionary pressure functions over those genes. In order to gain more insight into Akkermansia adaptations to the gastrointestinal tract niche, we performed a phylogenomic analysis of 367 high-quality Akkermansia isolates and metagenome-assembled genomes, in addition to other members of Verrucomicrobiota. This work was focused on three aspects: the definition of Akkermansia genomic species clusters and the calculation and functional characterization of the pangenome for the most represented species; the evolutionary relationship between Akkermansia and their closest relatives from Verrucomicrobiota, defining the gene families which were gained or lost during the emergence of the last Akkermansia common ancestor (LAkkCA) and; the evaluation of the evolutionary pressure metrics for each relevant gene family of main Akkermansia species. This analysis found 25 Akkermansia genomic species clusters distributed in two main clades, divergent from their non-Akkermansia relatives. Pangenome analyses suggest that Akkermansia species have open pangenomes, and the gene gain/loss model indicates that genes associated with mucin degradation (both glycoside hydrolases and peptidases), (micro)aerobic metabolism, surface interaction, and adhesion were part of LAkkCA. Specifically, mucin degradation is a very ancestral innovation involved in the origin of Akkermansia. Horizontal gene transfer detection suggests that Akkermansia could receive genes mostly from unknown sources or from other Gram-negative gut bacteria. Evolutionary metrics suggest that Akkemansia species evolved differently, and even some conserved genes suffered different evolutionary pressures among clades. These results suggest a complex evolutionary landscape of the genus and indicate that mucin degradation could be an essential feature in Akkermansia evolution as a symbiotic species.
Collapse
Affiliation(s)
- Dámariz González
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| | - Mauricio Morales-Olavarria
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| | - Boris Vidal-Veuthey
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| | - Juan P. Cárdenas
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
- Escuela de Biotecnología, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| |
Collapse
|
38
|
Fuselli S, Greco S, Biello R, Palmitessa S, Lago M, Meneghetti C, McDougall C, Trucchi E, Rota Stabelli O, Biscotti AM, Schmidt DJ, Roberts DT, Espinoza T, Hughes JM, Ometto L, Gerdol M, Bertorelle G. Relaxation of Natural Selection in the Evolution of the Giant Lungfish Genomes. Mol Biol Evol 2023; 40:msad193. [PMID: 37671664 PMCID: PMC10503785 DOI: 10.1093/molbev/msad193] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 07/16/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
Nonadaptive hypotheses on the evolution of eukaryotic genome size predict an expansion when the process of purifying selection becomes weak. Accordingly, species with huge genomes, such as lungfish, are expected to show a genome-wide relaxation signature of selection compared with other organisms. However, few studies have empirically tested this prediction using genomic data in a comparative framework. Here, we show that 1) the newly assembled transcriptome of the Australian lungfish, Neoceratodus forsteri, is characterized by an excess of pervasive transcription, or transcriptional leakage, possibly due to suboptimal transcriptional control, and 2) a significant relaxation signature in coding genes in lungfish species compared with other vertebrates. Based on these observations, we propose that the largest known animal genomes evolved in a nearly neutral scenario where genome expansion is less efficiently constrained.
Collapse
Affiliation(s)
- Silvia Fuselli
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Samuele Greco
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | - Roberto Biello
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | | | - Marta Lago
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Corrado Meneghetti
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Carmel McDougall
- Australian Rivers Institute, Griffith University, Brisbane, Queensland, Australia
| | - Emiliano Trucchi
- Department of Life and Environmental Sciences, Marche Polytechnic University, Ancona, Italy
| | - Omar Rota Stabelli
- Research and Innovation Centre, Fondazione Edmund Mach, 38010 San Michele all’Adige, Italy
- Center Agriculture Food Environment, University of Trento, 38010 San Michele all'Adige, Italy
| | - Assunta Maria Biscotti
- Department of Life and Environmental Sciences, Marche Polytechnic University, Ancona, Italy
| | - Daniel J Schmidt
- Australian Rivers Institute, Griffith University, Brisbane, Queensland, Australia
| | | | | | - Jane Margaret Hughes
- Australian Rivers Institute, Griffith University, Brisbane, Queensland, Australia
| | - Lino Ometto
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy
| | - Marco Gerdol
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | - Giorgio Bertorelle
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| |
Collapse
|
39
|
Menger FM, Rizvi SAA. Preassembly Theory Invoking Prehistoric DNA Alterations. WORLD FUTURES 2023; 79:635-646. [DOI: 10.1080/02604027.2023.2226594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/16/2023]
|
40
|
Meier-Credo J, Heiniger B, Schori C, Rupprecht F, Michel H, Ahrens CH, Langer JD. Detection of Known and Novel Small Proteins in Pseudomonas stutzeri Using a Combination of Bottom-Up and Digest-Free Proteomics and Proteogenomics. Anal Chem 2023; 95:11892-11900. [PMID: 37535005 PMCID: PMC10433244 DOI: 10.1021/acs.analchem.3c00676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 07/24/2023] [Indexed: 08/04/2023]
Abstract
Small proteins of around 50 aa in length have been largely overlooked in genetic and biochemical assays due to the inherent challenges with detecting and characterizing them. Recent discoveries of their critical roles in many biological processes have led to an increased recognition of the importance of small proteins for basic research and as potential new drug targets. One example is CcoM, a 36 aa subunit of the cbb3-type oxidase that plays an essential role in adaptation to oxygen-limited conditions in Pseudomonas stutzeri (P. stutzeri), a model for the clinically relevant, opportunistic pathogen Pseudomonas aeruginosa. However, as no comprehensive data were available in P. stutzeri, we devised an integrated, generic approach to study small proteins more systematically. Using the first complete genome as basis, we conducted bottom-up proteomics analyses and established a digest-free, direct-sequencing proteomics approach to study cells grown under aerobic and oxygen-limiting conditions. Finally, we also applied a proteogenomics pipeline to identify missed protein-coding genes. Overall, we identified 2921 known and 29 novel proteins, many of which were differentially regulated. Among 176 small proteins 16 were novel. Direct sequencing, featuring a specialized precursor acquisition scheme, exhibited advantages in the detection of small proteins with higher (up to 100%) sequence coverage and more spectral counts, including sequences with high proline content. Three novel small proteins, uniquely identified by direct sequencing and not conserved beyond P. stutzeri, were predicted to form an operon with a conserved protein and may represent de novo genes. These data demonstrate the power of this combined approach to study small proteins in P. stutzeri and show its potential for other prokaryotes.
Collapse
Affiliation(s)
- Jakob Meier-Credo
- Proteomics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
| | - Benjamin Heiniger
- Molecular
Ecology, Agroscope & SIB Swiss Institute
of Bioinformatics, 8046 Zürich, Switzerland
| | - Christian Schori
- Molecular
Ecology, Agroscope & SIB Swiss Institute
of Bioinformatics, 8046 Zürich, Switzerland
| | - Fiona Rupprecht
- Proteomics, Max Planck Institute for Brain
Research, 60438 Frankfurt
am Main, Germany
| | - Hartmut Michel
- Department
of Molecular Membrane Biology, Max Planck
Institute of Biophysics, 60438 Frankfurt am Main, Germany
| | - Christian H. Ahrens
- Molecular
Ecology, Agroscope & SIB Swiss Institute
of Bioinformatics, 8046 Zürich, Switzerland
| | - Julian D. Langer
- Proteomics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
- Proteomics, Max Planck Institute for Brain
Research, 60438 Frankfurt
am Main, Germany
| |
Collapse
|
41
|
Lombardo KD, Sheehy HK, Cridland JM, Begun DJ. Identifying candidate de novo genes expressed in the somatic female reproductive tract of Drosophila melanogaster. G3 (BETHESDA, MD.) 2023; 13:jkad122. [PMID: 37259569 PMCID: PMC10411569 DOI: 10.1093/g3journal/jkad122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/02/2023]
Abstract
Most eukaryotic genes have been vertically transmitted to the present from distant ancestors. However, variable gene number across species indicates that gene gain and loss also occurs. While new genes typically originate as products of duplications and rearrangements of preexisting genes, putative de novo genes-genes born out of ancestrally nongenic sequence-have been identified. Previous studies of de novo genes in Drosophila have provided evidence that expression in male reproductive tissues is common. However, no studies have focused on female reproductive tissues. Here we begin addressing this gap in the literature by analyzing the transcriptomes of 3 female reproductive tract organs (spermatheca, seminal receptacle, and parovaria) in 3 species-our focal species, Drosophila melanogaster-and 2 closely related species, Drosophila simulans and Drosophila yakuba, with the goal of identifying putative D. melanogaster-specific de novo genes expressed in these tissues. We discovered several candidate genes, located in sequence annotated as intergenic. Consistent with the literature, these genes tend to be short, single exon, and lowly expressed. We also find evidence that some of these genes are expressed in other D. melanogaster tissues and both sexes. The relatively small number of intergenic candidate genes discovered here is similar to that observed in the accessory gland, but substantially fewer than that observed in the testis.
Collapse
Affiliation(s)
- Kaelina D Lombardo
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - Hayley K Sheehy
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - Julie M Cridland
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - David J Begun
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
42
|
Yocca AE, Platts A, Alger E, Teresi S, Mengist MF, Benevenuto J, Ferrão LFV, Jacobs M, Babinski M, Magallanes-Lundback M, Bayer P, Golicz A, Humann JL, Main D, Espley RV, Chagné D, Albert NW, Montanari S, Vorsa N, Polashock J, Díaz-Garcia L, Zalapa J, Bassil NV, Munoz PR, Iorizzo M, Edger PP. Blueberry and cranberry pangenomes as a resource for future genetic studies and breeding efforts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551392. [PMID: 37577683 PMCID: PMC10418200 DOI: 10.1101/2023.07.31.551392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Domestication of cranberry and blueberry began in the United States in the early 1800s and 1900s, respectively, and in part owing to their flavors and health-promoting benefits are now cultivated and consumed worldwide. The industry continues to face a wide variety of production challenges (e.g. disease pressures) as well as a demand for higher-yielding cultivars with improved fruit quality characteristics. Unfortunately, molecular tools to help guide breeding efforts for these species have been relatively limited compared with those for other high-value crops. Here, we describe the construction and analysis of the first pangenome for both blueberry and cranberry. Our analysis of these pangenomes revealed both crops exhibit great genetic diversity, including the presence-absence variation of 48.4% genes in highbush blueberry and 47.0% genes in cranberry. Auxiliary genes, those not shared by all cultivars, are significantly enriched with molecular functions associated with disease resistance and the biosynthesis of specialized metabolites, including compounds previously associated with improving fruit quality traits. The discovery of thousands of genes, not present in the previous reference genomes for blueberry and cranberry, will serve as the basis of future research and as potential targets for future breeding efforts. The pangenome, as a multiple-sequence alignment, as well as individual annotated genomes, are publicly available for analysis on the Genome Database for Vaccinium - a curated and integrated web-based relational database. Lastly, the core-gene predictions from the pangenomes will serve useful to develop a community genotyping platform to guide future molecular breeding efforts across the family.
Collapse
Affiliation(s)
- Alan E. Yocca
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Adrian Platts
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Elizabeth Alger
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
| | - Scott Teresi
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Molla F. Mengist
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
| | - Juliana Benevenuto
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Luis Felipe V. Ferrão
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - MacKenzie Jacobs
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Michal Babinski
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
| | | | - Philipp Bayer
- University of Western Australia, Perth 6009 Australia
| | | | - Jodi L Humann
- Department of Horticulture, Washington State University, Pullman, WA, 99163, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, 99163, USA
| | - Richard V. Espley
- The New Zealand Institute for Plant and Food Research Limited (PFR), Auckland, New Zealand
| | - David Chagné
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston, New Zealand
| | - Nick W. Albert
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston, New Zealand
| | - Sara Montanari
- The New Zealand Institute for Plant and Food Research Limited (PFR), Motueka, New Zealand
| | - Nicholi Vorsa
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019 USA
| | - James Polashock
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019 USA
| | - Luis Díaz-Garcia
- USDA-ARS, VCRU, Department of Horticulture, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Juan Zalapa
- USDA-ARS, VCRU, Department of Horticulture, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Nahla V. Bassil
- USDA-ARS, National Clonal Germplasm Repository, Corvallis, OR 97333, USA
| | - Patricio R. Munoz
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Massimo Iorizzo
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
- Department of Horticulture, North Carolina State University, Kannapolis, NC USA
| | - Patrick P. Edger
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences, Michigan State University, East Lansing, MI, 48824, USA
- MSU AgBioResearch, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
43
|
Jaiswal M, Kumar S. smAMPsTK: a toolkit to unravel the smORFome encoding AMPs of plant species. J Biomol Struct Dyn 2023:1-13. [PMID: 37464885 DOI: 10.1080/07391102.2023.2235605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 07/06/2023] [Indexed: 07/20/2023]
Abstract
The pervasive repertoire of plant molecules with the potential to serve as a substitute for conventional antibiotics has led to obtaining better insights into plant-derived antimicrobial peptides (AMPs). The massive distribution of Small Open Reading Frames (smORFs) throughout eukaryotic genomes with proven extensive biological functions reflects their practicality as antimicrobials. Here, we have developed a pipeline named smAMPsTK to unveil the underlying hidden smORFs encoding AMPs for plant species. By applying this pipeline, we have elicited AMPs of various functional activity of lengths ranging from 5 to 100 aa by employing publicly available transcriptome data of five different angiosperms. Later, we studied the coding potential of AMPs-smORFs, the inclusion of diverse translation initiation start codons, and amino acid frequency. Codon usage study signifies no such codon usage biases for smORFs encoding AMPs. Majorly three start codons are prominent in generating AMPs. The evolutionary and conservational study proclaimed the widespread distribution of AMPs encoding genes throughout the plant kingdom. Domain analysis revealed that nearly all AMPs have chitin-binding ability, establishing their role as antifungal agents. The current study includes a developed methodology to characterize smORFs encoding AMPs, and their implications as antimicrobial, antibacterial, antifungal, or antiviral provided by SVM score and prediction status calculated by machine learning-based prediction models. The pipeline, complete package, and the results derived for five angiosperms are freely available at https://github.com/skbinfo/smAMPsTK.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Mohini Jaiswal
- Bioinformatics Laboratory, National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi, India
| | - Shailesh Kumar
- Bioinformatics Laboratory, National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi, India
| |
Collapse
|
44
|
Athanasouli M, Akduman N, Röseler W, Theam P, Rödelsperger C. Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota. PLoS Genet 2023; 19:e1010832. [PMID: 37399201 DOI: 10.1371/journal.pgen.1010832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/15/2023] [Indexed: 07/05/2023] Open
Abstract
Adaptation of organisms to environmental change may be facilitated by the creation of new genes. New genes without homologs in other lineages are known as taxonomically-restricted orphan genes and may result from divergence or de novo formation. Previously, we have extensively characterized the evolution and origin of such orphan genes in the nematode model organism Pristionchus pacificus. Here, we employ large-scale transcriptomics to establish potential functional associations and to measure the degree of transcriptional plasticity among orphan genes. Specifically, we analyzed 24 RNA-seq samples from adult P. pacificus worms raised on 24 different monoxenic bacterial cultures. Based on coexpression analysis, we identified 28 large modules that harbor 3,727 diplogastrid-specific orphan genes and that respond dynamically to different bacteria. These coexpression modules have distinct regulatory architecture and also exhibit differential expression patterns across development suggesting a link between bacterial response networks and development. Phylostratigraphy revealed a considerably high number of family- and even species-specific orphan genes in certain coexpression modules. This suggests that new genes are not attached randomly to existing cellular networks and that integration can happen very fast. Integrative analysis of protein domains, gene expression and ortholog data facilitated the assignments of biological labels for 22 coexpression modules with one of the largest, fast-evolving module being associated with spermatogenesis. In summary, this work presents the first functional annotation for thousands of P. pacificus orphan genes and reveals insights into their integration into environmentally responsive gene networks.
Collapse
Affiliation(s)
- Marina Athanasouli
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Nermin Akduman
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Waltraud Röseler
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Penghieng Theam
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Christian Rödelsperger
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
45
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
46
|
Ardern Z, Uz-Zaman MH. Between noise and function: Toward a taxonomy of the non-canonical translatome. Cell Syst 2023; 14:343-345. [PMID: 37201506 DOI: 10.1016/j.cels.2023.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 04/17/2023] [Indexed: 05/20/2023]
Abstract
Eukaryotic genomes are pervasively translated, but the properties of translated sequences outside of canonical genes are poorly understood. A new study in Cell Systems reveals a large translatome that is not under significant evolutionary constraint but is still an active part of diverse cellular systems.
Collapse
Affiliation(s)
- Zachary Ardern
- Parasites and Microbes Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK.
| | - Md Hassan Uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
47
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
48
|
Lombardo KD, Sheehy HK, Cridland JM, Begun DJ. Identifying candidate de novo genes expressed in the somatic female reproductive tract of Drosophila melanogaster. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539262. [PMID: 37205537 PMCID: PMC10187257 DOI: 10.1101/2023.05.03.539262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Most eukaryotic genes have been vertically transmitted to the present from distant ancestors. However, variable gene number across species indicates that gene gain and loss also occurs. While new genes typically originate as products of duplications and rearrangements of pre-existing genes, putative de novo genes - genes born out of previously non-genic sequence - have been identified. Previous studies of de novo genes in Drosophila have provided evidence that expression in male reproductive tissues is common. However, no studies have focused on female reproductive tissues. Here we begin addressing this gap in the literature by analyzing the transcriptomes of three female reproductive tract organs (spermatheca, seminal receptacle, and parovaria) in three species - our focal species, D. melanogaster - and two closely related species, D. simulans and D. yakuba , with the goal of identifying putative D. melanogaster -specific de novo genes expressed in these tissues. We discovered several candidate genes, which, consistent with the literature, tend to be short, simple, and lowly expressed. We also find evidence that some of these genes are expressed in other D. melanogaster tissues and both sexes. The relatively small number of candidate genes discovered here is similar to that observed in the accessory gland, but substantially fewer than that observed in the testis.
Collapse
Affiliation(s)
- Kaelina D Lombardo
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - Hayley K Sheehy
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - Julie M Cridland
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis CA 95616
| |
Collapse
|
49
|
Saeki N, Yamamoto C, Eguchi Y, Sekito T, Shigenobu S, Yoshimura M, Yashiroda Y, Boone C, Moriya H. Overexpression profiling reveals cellular requirements in the context of genetic backgrounds and environments. PLoS Genet 2023; 19:e1010732. [PMID: 37115757 PMCID: PMC10171610 DOI: 10.1371/journal.pgen.1010732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 05/10/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023] Open
Abstract
Overexpression can help life adapt to stressful environments, making an examination of overexpressed genes valuable for understanding stress tolerance mechanisms. However, a systematic study of genes whose overexpression is functionally adaptive (GOFAs) under stress has yet to be conducted. We developed a new overexpression profiling method and systematically identified GOFAs in Saccharomyces cerevisiae under stress (heat, salt, and oxidative). Our results show that adaptive overexpression compensates for deficiencies and increases fitness under stress, like calcium under salt stress. We also investigated the impact of different genetic backgrounds on GOFAs, which varied among three S. cerevisiae strains reflecting differing calcium and potassium requirements for salt stress tolerance. Our study of a knockout collection also suggested that calcium prevents mitochondrial outbursts under salt stress. Mitochondria-enhancing GOFAs were only adaptive when adequate calcium was available and non-adaptive when calcium was deficient, supporting this idea. Our findings indicate that adaptive overexpression meets the cell's needs for maximizing the organism's adaptive capacity in the given environment and genetic context.
Collapse
Affiliation(s)
- Nozomu Saeki
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Chie Yamamoto
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Yuichi Eguchi
- Biomedical Business Center, RICOH Futures BU, Kanagawa, Japan
| | - Takayuki Sekito
- Graduate School of Agriculture, Ehime University, Matsuyama, Japan
| | | | - Mami Yoshimura
- RIKEN Center for Sustainable Resource Science, Wako, Japan
| | - Yoko Yashiroda
- RIKEN Center for Sustainable Resource Science, Wako, Japan
| | - Charles Boone
- RIKEN Center for Sustainable Resource Science, Wako, Japan
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Hisao Moriya
- Faculty of Environmental, Life, Natural Science and Technology, Okayama University, Okayama, Japan
| |
Collapse
|
50
|
Crespo-Bellido A, Duffy S. The how of counter-defense: viral evolution to combat host immunity. Curr Opin Microbiol 2023; 74:102320. [PMID: 37075547 DOI: 10.1016/j.mib.2023.102320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 03/10/2023] [Accepted: 03/23/2023] [Indexed: 04/21/2023]
Abstract
Viruses are locked in an evolutionary arms race with their hosts. What ultimately determines viral evolvability, or capacity for adaptive evolution, is their ability to efficiently explore and expand sequence space while under the selective regime imposed by their ecology, which includes innate and adaptive host defenses. Viral genomes have significantly higher evolutionary rates than their host counterparts and should have advantages relative to their slower-evolving hosts. However, functional constraints on virus evolutionary landscapes along with the modularity and mutational tolerance of host defense proteins may help offset the advantage conferred to viruses by high evolutionary rates. Additionally, cellular life forms from all domains of life possess many highly complex defense mechanisms that act as hurdles to viral replication. Consequently, viruses constantly probe sequence space through mutation and genetic exchange and are under pressure to optimize diverse counter-defense strategies.
Collapse
Affiliation(s)
- Alvin Crespo-Bellido
- Department of Ecology, Evolution and Natural Resources, School of Environmental and Biological Sciences, Rutgers, the State University of New Jersey, New Brunswick, NJ, USA
| | - Siobain Duffy
- Department of Ecology, Evolution and Natural Resources, School of Environmental and Biological Sciences, Rutgers, the State University of New Jersey, New Brunswick, NJ, USA.
| |
Collapse
|