1
|
Vara C, Montañés JC, Albà MM. High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population. Genome Biol Evol 2024; 16:evae126. [PMID: 38934859 DOI: 10.1093/gbe/evae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024] Open
Abstract
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
Collapse
Affiliation(s)
- Covadonga Vara
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - José Carlos Montañés
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - M Mar Albà
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
2
|
Lee U, Mozeika SM, Zhao L. A Synergistic, Cultivator Model of De Novo Gene Origination. Genome Biol Evol 2024; 16:evae103. [PMID: 38748819 PMCID: PMC11152449 DOI: 10.1093/gbe/evae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 06/07/2024] Open
Abstract
The origin and fixation of evolutionarily young genes is a fundamental question in evolutionary biology. However, understanding the origins of newly evolved genes arising de novo from noncoding genomic sequences is challenging. This is partly due to the low likelihood that several neutral or nearly neutral mutations fix prior to the appearance of an important novel molecular function. This issue is particularly exacerbated in large effective population sizes where the effect of drift is small. To address this problem, we propose a regulation-focused, cultivator model for de novo gene evolution. This cultivator-focused model posits that each step in a novel variant's evolutionary trajectory is driven by well-defined, selectively advantageous functions for the cultivator genes, rather than solely by the de novo genes, emphasizing the critical role of genome organization in the evolution of new genes.
Collapse
Affiliation(s)
- UnJin Lee
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Shawn M Mozeika
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| |
Collapse
|
3
|
Chen J, Li Q, Xia S, Arsala D, Sosa D, Wang D, Long M. The Rapid Evolution of De Novo Proteins in Structure and Complex. Genome Biol Evol 2024; 16:evae107. [PMID: 38753069 PMCID: PMC11149777 DOI: 10.1093/gbe/evae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2024] [Indexed: 06/06/2024] Open
Abstract
Recent studies in the rice genome-wide have established that de novo genes, evolving from noncoding sequences, enhance protein diversity through a stepwise process. However, the pattern and rate of their evolution in protein structure over time remain unclear. Here, we addressed these issues within a surprisingly short evolutionary timescale (<1 million years for 97% of Oryza de novo genes) with comparative approaches to gene duplicates. We found that de novo genes evolve faster than gene duplicates in the intrinsically disordered regions (such as random coils), secondary structure elements (such as α helix and β strand), hydrophobicity, and molecular recognition features. In de novo proteins, specifically, we observed an 8% to 14% decay in random coils and intrinsically disordered region lengths and a 2.3% to 6.5% increase in structured elements, hydrophobicity, and molecular recognition features, per million years on average. These patterns of structural evolution align with changes in amino acid composition over time as well. We also revealed higher positive charges but smaller molecular weights for de novo proteins than duplicates. Tertiary structure predictions showed that most de novo proteins, though not typically well folded on their own, readily form low-energy and compact complexes with other proteins facilitated by extensive residue contacts and conformational flexibility, suggesting a faster-binding scenario in de novo proteins to promote interaction. These analyses illuminate a rapid evolution of protein structure in de novo genes in rice genomes, originating from noncoding sequences, highlighting their quick transformation into active, protein complex-forming components within a remarkably short evolutionary timeframe.
Collapse
Affiliation(s)
- Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Qingrong Li
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dong Wang
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
4
|
uz-Zaman MH, D’Alton S, Barrick JE, Ochman H. Promoter recruitment drives the emergence of proto-genes in a long-term evolution experiment with Escherichia coli. PLoS Biol 2024; 22:e3002418. [PMID: 38713714 PMCID: PMC11101190 DOI: 10.1371/journal.pbio.3002418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/17/2024] [Accepted: 04/18/2024] [Indexed: 05/09/2024] Open
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli long-term evolution experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, with levels of transcription across low-expressed regions increasing in later generations of the experiment. Proto-genes formed downstream of new mutations result either from insertion element activity or chromosomal translocations that fused preexisting regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter, although such cases were rare compared to those caused by recruitment of preexisting promoters. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, can persist stably, and can serve as potential substrates for new gene formation.
Collapse
Affiliation(s)
- Md. Hassan uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Simon D’Alton
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
5
|
Aubel M, Buchel F, Heames B, Jones A, Honc O, Bornberg-Bauer E, Hlouchova K. High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential. Genome Biol Evol 2024; 16:evae069. [PMID: 38597156 PMCID: PMC11024478 DOI: 10.1093/gbe/evae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/23/2024] [Indexed: 04/11/2024] Open
Abstract
De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Filip Buchel
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Department of Biochemistry, Faculty of Science, Charles University, Prague, Czech Republic
| | - Brennen Heames
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Alun Jones
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Ondrej Honc
- Imaging Methods Core Facility, BIOCEV, Prague, Czech Republic
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
- Department of Protein Evolution, Max Planck-Institute for Biology Tuebingen, Tuebingen, Germany
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
| |
Collapse
|
6
|
Camellato BR, Brosh R, Ashe HJ, Maurano MT, Boeke JD. Synthetic reversed sequences reveal default genomic states. Nature 2024; 628:373-380. [PMID: 38448583 PMCID: PMC11006607 DOI: 10.1038/s41586-024-07128-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Pervasive transcriptional activity is observed across diverse species. The genomes of extant organisms have undergone billions of years of evolution, making it unclear whether these genomic activities represent effects of selection or 'noise'1-4. Characterizing default genome states could help understand whether pervasive transcriptional activity has biological meaning. Here we addressed this question by introducing a synthetic 101-kb locus into the genomes of Saccharomyces cerevisiae and Mus musculus and characterizing genomic activity. The locus was designed by reversing but not complementing human HPRT1, including its flanking regions, thus retaining basic features of the natural sequence but ablating evolved coding or regulatory information. We observed widespread activity of both reversed and native HPRT1 loci in yeast, despite the lack of evolved yeast promoters. By contrast, the reversed locus displayed no activity at all in mouse embryonic stem cells, and instead exhibited repressive chromatin signatures. The repressive signature was alleviated in a locus variant lacking CpG dinucleotides; nevertheless, this variant was also transcriptionally inactive. These results show that synthetic genomic sequences that lack coding information are active in yeast, but inactive in mouse embryonic stem cells, consistent with a major difference in 'default genomic states' between these two divergent eukaryotic cell types, with implications for understanding pervasive transcription, horizontal transfer of genetic information and the birth of new genes.
Collapse
Affiliation(s)
| | - Ran Brosh
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Hannah J Ashe
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Matthew T Maurano
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
- Department of Pathology, NYU Langone Health, New York, NY, USA
| | - Jef D Boeke
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA.
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.
- Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY, USA.
| |
Collapse
|
7
|
Rives N, Lamba V, Christina Cheng CH, Zhuang X. Diverse origins of near-identical antifreeze proteins in unrelated fish lineages provide insights into evolutionary mechanisms of new gene birth and protein sequence convergence. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584730. [PMID: 38559027 PMCID: PMC10980009 DOI: 10.1101/2024.03.12.584730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Determining the origins of novel genes and the genetic mechanisms underlying the emergence of new functions is challenging yet crucial for understanding evolutionary innovations. The novel fish antifreeze proteins, exemplifying convergent evolution, represent excellent opportunities to investigate the evolutionary origins and pathways of new genes. Particularly notable is the near-identical type I antifreeze proteins (AFPI) in four phylogenetically divergent fish taxa. This study tested the hypothesis of protein sequence convergence beyond functional convergence in three unrelated AFPI-bearing fish lineages, revealing different paths by which a similar protein arose from diverse genomic resources. Comprehensive comparative analyses of de novo sequenced genome of the winter flounder and grubby sculpin, available high-quality genome of the cunner, and those of 14 other relevant species found that the near-identical AFPI originated from a distinct genetic precursor in each lineage, and independently evolved coding regions for the novel ice-binding protein while retaining sequence identity in the regulatory regions with their respective ancestor. The deduced evolutionary processes and molecular mechanisms is consistent with the Innovation-Amplification-Divergence (IAD) model applicable to AFPI formation in all three lineages, a new Duplication-Degeneration-Divergence (DDD) model we propose for the sculpin lineage, and a DDD model with gene fission for the cunner lineage. This investigation illustrates the multiple ways by which a novel functional gene with sequence convergence at the protein level could evolve across divergent species, advancing our understanding of the mechanistic intricacies in new gene formation.
Collapse
Affiliation(s)
- Nathan Rives
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Vinita Lamba
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - C.-H. Christina Cheng
- Department of Evolution, Ecology and Behavior, University of Illinois, Urbana-Champaign, IL, USA
| | - Xuan Zhuang
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| |
Collapse
|
8
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
9
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun 2024; 15:810. [PMID: 38280868 PMCID: PMC10821953 DOI: 10.1038/s41467-024-45028-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/09/2024] [Indexed: 01/29/2024] Open
Abstract
Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation. These young genes provide an opportunity to study the structural and functional origins of proteins. Here, we combine high-quality base-level whole-genome alignments and computational structural modeling to study the origination, evolution, and protein structures of lineage-specific de novo genes. We identify 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. Sequence composition, evolutionary rates, and expression patterns indicate possible gradual functional or adaptive shifts with their gene ages. Surprisingly, we find little overall protein structural changes in candidates from the Drosophilinae lineage. We identify several candidates with potentially well-folded protein structures. Ancestral sequence reconstruction analysis reveals that most potentially well-folded candidates are often born well-folded. Single-cell RNA-seq analysis in testis shows that although most de novo gene candidates are enriched in spermatocytes, several young candidates are biased towards the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and protein structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
10
|
Frumkin I, Laub MT. Selection of a de novo gene that can promote survival of Escherichia coli by modulating protein homeostasis pathways. Nat Ecol Evol 2023; 7:2067-2079. [PMID: 37945946 PMCID: PMC10697842 DOI: 10.1038/s41559-023-02224-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 09/12/2023] [Indexed: 11/12/2023]
Abstract
Cellular novelty can emerge when non-functional loci become functional genes in a process termed de novo gene birth. But how proteins with random amino acid sequences beneficially integrate into existing cellular pathways remains poorly understood. We screened ~108 genes, generated from random nucleotide sequences and devoid of homology to natural genes, for their ability to rescue growth arrest of Escherichia coli cells producing the ribonuclease toxin MazF. We identified ~2,000 genes that could promote growth, probably by reducing transcription from the promoter driving toxin expression. Additionally, one random protein, named Random antitoxin of MazF (RamF), modulated protein homeostasis by interacting with chaperones, leading to MazF proteolysis and a consequent loss of its toxicity. Finally, we demonstrate that random proteins can improve during evolution by identifying beneficial mutations that turned RamF into a more efficient inhibitor. Our work provides a mechanistic basis for how de novo gene birth can produce functional proteins that effectively benefit cells evolving under stress.
Collapse
Affiliation(s)
- Idan Frumkin
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Cambridge, MA, USA.
| |
Collapse
|
11
|
Haltom J, Trovao NS, Guarnieri J, Vincent P, Singh U, Tsoy S, O'Leary CA, Bram Y, Widjaja GA, Cen Z, Meller R, Baylin SB, Moss WN, Nikolau BJ, Enguita FJ, Wallace DC, Beheshti A, Schwartz R, Wurtele ES. SARS-CoV-2 Orphan Gene ORF10 Contributes to More Severe COVID-19 Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.27.23298847. [PMID: 38076862 PMCID: PMC10705665 DOI: 10.1101/2023.11.27.23298847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
The orphan gene of SARS-CoV-2, ORF10, is the least studied gene in the virus responsible for the COVID-19 pandemic. Recent experimentation indicated ORF10 expression moderates innate immunity in vitro. However, whether ORF10 affects COVID-19 in humans remained unknown. We determine that the ORF10 sequence is identical to the Wuhan-Hu-1 ancestral haplotype in 95% of genomes across five variants of concern (VOC). Four ORF10 variants are associated with less virulent clinical outcomes in the human host: three of these affect ORF10 protein structure, one affects ORF10 RNA structural dynamics. RNA-Seq data from 2070 samples from diverse human cells and tissues reveals ORF10 accumulation is conditionally discordant from that of other SARS-CoV-2 transcripts. Expression of ORF10 in A549 and HEK293 cells perturbs immune-related gene expression networks, alters expression of the majority of mitochondrially-encoded genes of oxidative respiration, and leads to large shifts in levels of 14 newly-identified transcripts. We conclude ORF10 contributes to more severe COVID-19 clinical outcomes in the human host.
Collapse
Affiliation(s)
- Jeffrey Haltom
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| | - Nidia S Trovao
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, 20892, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| | - Joseph Guarnieri
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| | - Pan Vincent
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Urminder Singh
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
| | - Sergey Tsoy
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Collin A O'Leary
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Yaron Bram
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Gabrielle A Widjaja
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Zimu Cen
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Robert Meller
- Morehouse School of Medicine, Atlanta, GA , 30310-1495, USA
| | - Stephen B Baylin
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21231
- Van Andel Research Institute, Grand Rapids, MI 49503
| | - Walter N Moss
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Basil J Nikolau
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Francisco J Enguita
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| | - Douglas C Wallace
- Center for Mitochondrial and Epigenomic Medicine, Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Afshin Beheshti
- COVID-19 International Research Team, Medford, MA 02155, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Blue Marble Space Institute of Science, Seattle, WA, 98104 USA
| | - Robert Schwartz
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- Department of Biomedical Engineering, Cornell University, Ithaca, NY, USA
| | - Eve Syrkin Wurtele
- Bioinformatics and Computational Biology Program, and Genetics Program, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- COVID-19 International Research Team, Medford, MA 02155, USA
| |
Collapse
|
12
|
Uz-Zaman MH, D'Alton S, Barrick JE, Ochman H. Promoter capture drives the emergence of proto-genes in Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567300. [PMID: 38013999 PMCID: PMC10680751 DOI: 10.1101/2023.11.15.567300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli Long-Term Evolution Experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time-span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, thereby serving as raw material for new gene emergence. Most proto-genes result either from insertion element activity or chromosomal translocations that fused pre-existing regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, persist stably, and can serve as potential substrates for new gene formation.
Collapse
|
13
|
Mani S, Tlusty T. Gene birth in a model of non-genic adaptation. BMC Biol 2023; 21:257. [PMID: 37957718 PMCID: PMC10644530 DOI: 10.1186/s12915-023-01745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 10/24/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Over evolutionary timescales, genomic loci can switch between functional and non-functional states through processes such as pseudogenization and de novo gene birth. Particularly, de novo gene birth is a widespread process, and many examples continue to be discovered across diverse evolutionary lineages. However, the general mechanisms that lead to functionalization are poorly understood, and estimated rates of de novo gene birth remain contentious. Here, we address this problem within a model that takes into account mutations and structural variation, allowing us to estimate the likelihood of emergence of new functions at non-functional loci. RESULTS Assuming biologically reasonable mutation rates and mutational effects, we find that functionalization of non-genic loci requires the realization of strict conditions. This is in line with the observation that most de novo genes are localized to the vicinity of established genes. Our model also provides an explanation for the empirical observation that emerging proto-genes are often lost despite showing signs of adaptation. CONCLUSIONS Our work elucidates the properties of non-genic loci that make them fertile for adaptation, and our results offer mechanistic insights into the process of de novo gene birth.
Collapse
Affiliation(s)
- Somya Mani
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea.
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
- Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| |
Collapse
|
14
|
Wang Z, Wang YW, Kasuga T, Lopez-Giraldez F, Zhang Y, Zhang Z, Wang Y, Dong C, Sil A, Trail F, Yarden O, Townsend JP. Lineage-specific genes are clustered with HET-domain genes and respond to environmental and genetic manipulations regulating reproduction in Neurospora. PLoS Genet 2023; 19:e1011019. [PMID: 37934795 PMCID: PMC10684091 DOI: 10.1371/journal.pgen.1011019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 11/28/2023] [Accepted: 10/16/2023] [Indexed: 11/09/2023] Open
Abstract
Lineage-specific genes (LSGs) have long been postulated to play roles in the establishment of genetic barriers to intercrossing and speciation. In the genome of Neurospora crassa, most of the 670 Neurospora LSGs that are aggregated adjacent to the telomeres are clustered with 61% of the HET-domain genes, some of which regulate self-recognition and define vegetative incompatibility groups. In contrast, the LSG-encoding proteins possess few to no domains that would help to identify potential functional roles. Possible functional roles of LSGs were further assessed by performing transcriptomic profiling in genetic mutants and in response to environmental alterations, as well as examining gene knockouts for phenotypes. Among the 342 LSGs that are dynamically expressed during both asexual and sexual phases, 64% were detectable on unusual carbon sources such as furfural, a wildfire-produced chemical that is a strong inducer of sexual development, and the structurally-related furan 5-hydroxymethyl furfural (HMF). Expression of a significant portion of the LSGs was sensitive to light and temperature, factors that also regulate the switch from asexual to sexual reproduction. Furthermore, expression of the LSGs was significantly affected in the knockouts of adv-1 and pp-1 that regulate hyphal communication, and expression of more than one quarter of the LSGs was affected by perturbation of the mating locus. These observations encouraged further investigation of the roles of clustered lineage-specific and HET-domain genes in ecology and reproduction regulation in Neurospora, especially the regulation of the switch from the asexual growth to sexual reproduction, in response to dramatic environmental conditions changes.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Takao Kasuga
- College of Biological Sciences, University of California, Davis, California, United States of America
| | | | - Yang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Yaning Wang
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Anita Sil
- Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Frances Trail
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
15
|
Wang Z, Wang YW, Kasuga T, Hassler H, Lopez-Giraldez F, Dong C, Yarden O, Townsend JP. Origins of lineage-specific elements via gene duplication, relocation, and regional rearrangement in Neurospora crassa. Mol Ecol 2023. [PMID: 37843462 DOI: 10.1111/mec.17168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/20/2023] [Accepted: 09/27/2023] [Indexed: 10/17/2023]
Abstract
The origin of new genes has long been a central interest of evolutionary biologists. However, their novelty means that they evade reconstruction by the classical tools of evolutionary modelling. This evasion of deep ancestral investigation necessitates intensive study of model species within well-sampled, recently diversified, clades. One such clade is the model genus Neurospora, members of which lack recent gene duplications. Several Neurospora species are comprehensively characterized organisms apt for studying the evolution of lineage-specific genes (LSGs). Using gene synteny, we documented that 78% of Neurospora LSG clusters are located adjacent to the telomeres featuring extensive tracts of non-coding DNA and duplicated genes. Here, we report several instances of LSGs that are likely from regional rearrangements and potentially from gene rebirth. To broadly investigate the functions of LSGs, we assembled transcriptomics data from 68 experimental data points and identified co-regulatory modules using Weighted Gene Correlation Network Analysis, revealing that LSGs are widely but peripherally involved in known regulatory machinery for diverse functions. The ancestral status of the LSG mas-1, a gene with roles in cell-wall integrity and cellular sensitivity to antifungal toxins, was investigated in detail alongside its genomic neighbours, indicating that it arose from an ancient lysophospholipase precursor that is ubiquitous in lineages of the Sordariomycetes. Our discoveries illuminate a "rummage region" in the N. crassa genome that enables the formation of new genes and functions to arise via gene duplication and relocation, followed by fast mutation and recombination facilitated by sequence repeats and unconstrained non-coding sequences.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Takao Kasuga
- College of Biological Sciences, University of California, Davis, Davis, California, USA
| | - Hayley Hassler
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | | | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
16
|
Grill S, Riley A, Selvaraj M, Lehmann R. HP6/Umbrea is dispensable for viability and fertility, suggesting essentiality of newly evolved genes is rare. Proc Natl Acad Sci U S A 2023; 120:e2309478120. [PMID: 37725638 PMCID: PMC10523450 DOI: 10.1073/pnas.2309478120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 07/15/2023] [Indexed: 09/21/2023] Open
Abstract
The newly evolved gene Heterochromatin Protein 6 (HP6), which has been previously classified as essential, challenged the dogma that functions required for viability are only seen in genes with a long evolutionary history. Based on previous RNA-sequencing analysis in Drosophila germ cells, we asked whether HP6 might play a role in germline development. Surprisingly, we found that CRISPR-generated HP6 mutants are viable and fertile. Using previously generated mutants, we identified an independent lethal allele and an RNAi off-target effect that prevented accurate interpretation of HP6 essentiality. By reviewing existing data, we found that the vast majority of young genes that were previously classified as essential were indeed viable when tested with orthologous methods. Together, our data call into question the frequency with which newly evolved genes gain essential functions and suggest that using multiple independent genetic methods is essential when probing the functions of young genes.
Collapse
Affiliation(s)
- Sherilyn Grill
- Department of Biology, Whitehead Institute, Massachusetts Institute of Technology, Cambridge, MA02142
| | - Ashley Riley
- Department of Biology, Whitehead Institute, Massachusetts Institute of Technology, Cambridge, MA02142
| | - Monica Selvaraj
- Department of Biology, Whitehead Institute, Massachusetts Institute of Technology, Cambridge, MA02142
| | - Ruth Lehmann
- Department of Biology, Whitehead Institute, Massachusetts Institute of Technology, Cambridge, MA02142
| |
Collapse
|
17
|
Chen R, Xiao N, Lu Y, Tao T, Huang Q, Wang S, Wang Z, Chuan M, Bu Q, Lu Z, Wang H, Su Y, Ji Y, Ding J, Gharib A, Liu H, Zhou Y, Tang S, Liang G, Zhang H, Yi C, Zheng X, Cheng Z, Xu Y, Li P, Xu C, Huang J, Li A, Yang Z. A de novo evolved gene contributes to rice grain shape difference between indica and japonica. Nat Commun 2023; 14:5906. [PMID: 37737275 PMCID: PMC10516980 DOI: 10.1038/s41467-023-41669-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 09/13/2023] [Indexed: 09/23/2023] Open
Abstract
The role of de novo evolved genes from non-coding sequences in regulating morphological differentiation between species/subspecies remains largely unknown. Here, we show that a rice de novo gene GSE9 contributes to grain shape difference between indica/xian and japonica/geng varieties. GSE9 evolves from a previous non-coding region of wild rice Oryza rufipogon through the acquisition of start codon. This gene is inherited by most japonica varieties, while the original sequence (absence of start codon, gse9) is present in majority of indica varieties. Knockout of GSE9 in japonica varieties leads to slender grains, whereas introgression to indica background results in round grains. Population evolutionary analyses reveal that gse9 and GSE9 are derived from wild rice Or-I and Or-III groups, respectively. Our findings uncover that the de novo GSE9 gene contributes to the genetic and morphological divergence between indica and japonica subspecies, and provide a target for precise manipulation of rice grain shape.
Collapse
Affiliation(s)
- Rujia Chen
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Ning Xiao
- Institute of Agricultural Sciences for Lixiahe Region in Jiangsu, Yangzhou, 225009, China
| | - Yue Lu
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Tianyun Tao
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Qianfeng Huang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Shuting Wang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Zhichao Wang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Mingli Chuan
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Qing Bu
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Zhou Lu
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Hanyao Wang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Yanze Su
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Yi Ji
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Jianheng Ding
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
| | - Ahmed Gharib
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Rice Department, Field Crops Research Institute, ARC, Sakha, Kafr El-Sheikh, 33717, Egypt
| | - Huixin Liu
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Yong Zhou
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Shuzhu Tang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Guohua Liang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Honggen Zhang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Chuandeng Yi
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Xiaoming Zheng
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Zhukuan Cheng
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Yang Xu
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Pengcheng Li
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China
| | - Chenwu Xu
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China.
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China.
| | - Jinling Huang
- Department of Biology, East Carolina University, Greenville, NC, 27858, USA.
- State Key Laboratory of Crop Stress Adaptation and Improvement, Key Laboratory of Plant Stress Biology, School of Life Sciences, Henan University, Kaifeng, 475004, China.
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.
| | - Aihong Li
- Institute of Agricultural Sciences for Lixiahe Region in Jiangsu, Yangzhou, 225009, China.
| | - Zefeng Yang
- Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Key Laboratory of Plant Functional Genomics of the Ministry of Education, Agriculture College of Yangzhou University, Yangzhou, 225009, China.
- Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Jiangsu Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou, 225009, China.
| |
Collapse
|
18
|
Trexler M, Bányai L, Kerekes K, Patthy L. Evolution of termination codons of proteins and the TAG-TGA paradox. Sci Rep 2023; 13:14294. [PMID: 37653005 PMCID: PMC10471768 DOI: 10.1038/s41598-023-41410-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 08/25/2023] [Indexed: 09/02/2023] Open
Abstract
In most eukaryotes and prokaryotes TGA is used at a significantly higher frequency than TAG as termination codon of protein-coding genes. Although this phenomenon has been recognized several years ago, there is no generally accepted explanation for the TAG-TGA paradox. Our analyses of human mutation data revealed that out of the eighteen sense codons that can give rise to a nonsense codon by single base substitution, the CGA codon is exceptional: it gives rise to the TGA stop codon at an order of magnitude higher rate than the other codons. Here we propose that the TAG-TGA paradox is due to methylation and hypermutabilty of CpG dinucleotides. In harmony with this explanation, we show that the coding genomes of organisms with strong CpG methylation have a significant bias for TGA whereas those from organisms that lack CpG methylation use TGA and TAG termination codons with similar probability.
Collapse
Affiliation(s)
- Mária Trexler
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | - László Bányai
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | - Krisztina Kerekes
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | - László Patthy
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary.
| |
Collapse
|
19
|
Liang X, Heath LS. Towards understanding paleoclimate impacts on primate de novo genes. G3 (BETHESDA, MD.) 2023; 13:jkad135. [PMID: 37313728 PMCID: PMC10468307 DOI: 10.1093/g3journal/jkad135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/31/2023] [Accepted: 06/08/2023] [Indexed: 06/15/2023]
Abstract
De novo genes are genes that emerge as new genes in some species, such as primate de novo genes that emerge in certain primate species. Over the past decade, a great deal of research has been conducted regarding their emergence, origins, functions, and various attributes in different species, some of which have involved estimating the ages of de novo genes. However, limited by the number of species available for whole-genome sequencing, relatively few studies have focused specifically on the emergence time of primate de novo genes. Among those, even fewer investigate the association between primate gene emergence with environmental factors, such as paleoclimate (ancient climate) conditions. This study investigates the relationship between paleoclimate and human gene emergence at primate species divergence. Based on 32 available primate genome sequences, this study has revealed possible associations between temperature changes and the emergence of de novo primate genes. Overall, findings in this study are that de novo genes tended to emerge in the recent 13 MY when the temperature continues cooling, which is consistent with past findings. Furthermore, in the context of an overall trend of cooling temperature, new primate genes were more likely to emerge during local warming periods, where the warm temperature more closely resembled the environmental condition that preceded the cooling trend. Results also indicate that both primate de novo genes and human cancer-associated genes have later origins in comparison to random human genes. Future studies can be in-depth on understanding human de novo gene emergence from an environmental perspective as well as understanding species divergence from a gene emergence perspective.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Lenwood S Heath
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|
20
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
21
|
Kozlov AP. Carcino-Evo-Devo, A Theory of the Evolutionary Role of Hereditary Tumors. Int J Mol Sci 2023; 24:ijms24108611. [PMID: 37239953 DOI: 10.3390/ijms24108611] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/08/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
A theory of the evolutionary role of hereditary tumors, or the carcino-evo-devo theory, is being developed. The main hypothesis of the theory, the hypothesis of evolution by tumor neofunctionalization, posits that hereditary tumors provided additional cell masses during the evolution of multicellular organisms for the expression of evolutionarily novel genes. The carcino-evo-devo theory has formulated several nontrivial predictions that have been confirmed in the laboratory of the author. It also suggests several nontrivial explanations of biological phenomena previously unexplained by the existing theories or incompletely understood. By considering three major types of biological development-individual, evolutionary, and neoplastic development-within one theoretical framework, the carcino-evo-devo theory has the potential to become a unifying biological theory.
Collapse
Affiliation(s)
- Andrei P Kozlov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 3 Gubkina Street, 117971 Moscow, Russia
- Peter the Great St. Petersburg Polytechnic University, 29 Polytekhnicheskaya Street, 195251 St. Petersburg, Russia
| |
Collapse
|
22
|
Sandmann CL, Schulz JF, Ruiz-Orera J, Kirchner M, Ziehm M, Adami E, Marczenke M, Christ A, Liebe N, Greiner J, Schoenenberger A, Muecke MB, Liang N, Moritz RL, Sun Z, Deutsch EW, Gotthardt M, Mudge JM, Prensner JR, Willnow TE, Mertins P, van Heesch S, Hubner N. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol Cell 2023; 83:994-1011.e18. [PMID: 36806354 PMCID: PMC10032668 DOI: 10.1016/j.molcel.2023.01.023] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/12/2022] [Accepted: 01/25/2023] [Indexed: 02/19/2023]
Abstract
All species continuously evolve short open reading frames (sORFs) that can be templated for protein synthesis and may provide raw materials for evolutionary adaptation. We analyzed the evolutionary origins of 7,264 recently cataloged human sORFs and found that most were evolutionarily young and had emerged de novo. We additionally identified 221 previously missed sORFs potentially translated into peptides of up to 15 amino acids-all of which are smaller than the smallest human microprotein annotated to date. To investigate the bioactivity of sORF-encoded small peptides and young microproteins, we subjected 266 candidates to a mass-spectrometry-based interactome screen with motif resolution. Based on these interactomes and additional cellular assays, we can associate several candidates with mRNA splicing, translational regulation, and endocytosis. Our work provides insights into the evolutionary origins and interaction potential of young and small proteins, thereby helping to elucidate this underexplored territory of the human proteome.
Collapse
Affiliation(s)
- Clara-L Sandmann
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jana F Schulz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jorge Ruiz-Orera
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Marieluise Kirchner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Matthias Ziehm
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Eleonora Adami
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Maike Marczenke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Annabel Christ
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Nina Liebe
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Johannes Greiner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Aaron Schoenenberger
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michael B Muecke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Ning Liang
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Michael Gotthardt
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John R Prensner
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Pediatric Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Thomas E Willnow
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark
| | - Philipp Mertins
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | | | - Norbert Hubner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany.
| |
Collapse
|
23
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
24
|
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep 2022; 41:111808. [PMID: 36543139 PMCID: PMC10073203 DOI: 10.1016/j.celrep.2022.111808] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 06/21/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022] Open
Abstract
Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari, Greece.
| | - Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Kate M Duggan
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
| |
Collapse
|
25
|
The Theory of Carcino-Evo-Devo and Its Non-Trivial Predictions. Genes (Basel) 2022; 13:genes13122347. [PMID: 36553613 PMCID: PMC9777766 DOI: 10.3390/genes13122347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 12/04/2022] [Accepted: 12/08/2022] [Indexed: 12/15/2022] Open
Abstract
To explain the sources of additional cell masses in the evolution of multicellular organisms, the theory of carcino-evo-devo, or evolution by tumor neofunctionalization, has been developed. The important demand for a new theory in experimental science is the capability to formulate non-trivial predictions which can be experimentally confirmed. Several non-trivial predictions were formulated using carcino-evo-devo theory, four of which are discussed in the present paper: (1) The number of cellular oncogenes should correspond to the number of cell types in the organism. The evolution of oncogenes, tumor suppressor and differentiation gene classes should proceed concurrently. (2) Evolutionarily new and evolving genes should be specifically expressed in tumors (TSEEN genes). (3) Human orthologs of fish TSEEN genes should acquire progressive functions connected with new cell types, tissues and organs. (4) Selection of tumors for new functions in the organism is possible. Evolutionarily novel organs should recapitulate tumor features in their development. As shown in this paper, these predictions have been confirmed by the laboratory of the author. Thus, we have shown that carcino-evo-devo theory has predictive power, fulfilling a fundamental requirement for a new theory.
Collapse
|
26
|
Petrzilek J, Pasulka J, Malik R, Horvat F, Kataruka S, Fulka H, Svoboda P. De novo emergence, existence, and demise of a protein-coding gene in murids. BMC Biol 2022; 20:272. [PMID: 36482406 PMCID: PMC9733328 DOI: 10.1186/s12915-022-01470-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/15/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Genes, principal units of genetic information, vary in complexity and evolutionary history. Less-complex genes (e.g., long non-coding RNA (lncRNA) expressing genes) readily emerge de novo from non-genic sequences and have high evolutionary turnover. Genesis of a gene may be facilitated by adoption of functional genic sequences from retrotransposon insertions. However, protein-coding sequences in extant genomes rarely lack any connection to an ancestral protein-coding sequence. RESULTS We describe remarkable evolution of the murine gene D6Ertd527e and its orthologs in the rodent Muroidea superfamily. The D6Ertd527e emerged in a common ancestor of mice and hamsters most likely as a lncRNA-expressing gene. A major contributing factor was a long terminal repeat (LTR) retrotransposon insertion carrying an oocyte-specific promoter and a 5' terminal exon of the gene. The gene survived as an oocyte-specific lncRNA in several extant rodents while in some others the gene or its expression were lost. In the ancestral lineage of Mus musculus, the gene acquired protein-coding capacity where the bulk of the coding sequence formed through CAG (AGC) trinucleotide repeat expansion and duplications. These events generated a cytoplasmic serine-rich maternal protein. Knock-out of D6Ertd527e in mice has a small but detectable effect on fertility and the maternal transcriptome. CONCLUSIONS While this evolving gene is not showing a clear function in laboratory mice, its documented evolutionary history in Muroidea during the last ~ 40 million years provides a textbook example of how a several common mutation events can support de novo gene formation, evolution of protein-coding capacity, as well as gene's demise.
Collapse
Affiliation(s)
- Jan Petrzilek
- grid.418827.00000 0004 0620 870XInstitute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic ,grid.22937.3d0000 0000 9259 8492Present address: Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Josef Pasulka
- grid.418827.00000 0004 0620 870XInstitute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic
| | - Radek Malik
- grid.418827.00000 0004 0620 870XInstitute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic
| | - Filip Horvat
- grid.418827.00000 0004 0620 870XInstitute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic ,grid.4808.40000 0001 0657 4636Bioinformatics Group, Division of Biology, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000 Zagreb, Croatia
| | - Shubhangini Kataruka
- grid.418827.00000 0004 0620 870XInstitute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic ,grid.47100.320000000419368710Present address: Department of Genetics, Yale School of Medicine, New Haven, CT 06510 USA
| | - Helena Fulka
- grid.418827.00000 0004 0620 870XInstitute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic ,grid.418095.10000 0001 1015 3316Current address: Institute of Experimental Medicine of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic
| | - Petr Svoboda
- grid.418827.00000 0004 0620 870XInstitute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague 4, Czech Republic
| |
Collapse
|
27
|
Ma C, Li C, Ma H, Yu D, Zhang Y, Zhang D, Su T, Wu J, Wang X, Zhang L, Chen CL, Zhang YE. Pan-cancer surveys indicate cell cycle-related roles of primate-specific genes in tumors and embryonic cerebrum. Genome Biol 2022; 23:251. [PMID: 36474250 PMCID: PMC9724437 DOI: 10.1186/s13059-022-02821-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 11/24/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Despite having been extensively studied, it remains largely unclear why humans bear a particularly high risk of cancer. The antagonistic pleiotropy hypothesis predicts that primate-specific genes (PSGs) tend to promote tumorigenesis, while the molecular atavism hypothesis predicts that PSGs involved in tumors may represent recently derived duplicates of unicellular genes. However, these predictions have not been tested. RESULTS By taking advantage of pan-cancer genomic data, we find the upregulation of PSGs across 13 cancer types, which is facilitated by copy-number gain and promoter hypomethylation. Meta-analyses indicate that upregulated PSGs (uPSGs) tend to promote tumorigenesis and to play cell cycle-related roles. The cell cycle-related uPSGs predominantly represent derived duplicates of unicellular genes. We prioritize 15 uPSGs and perform an in-depth analysis of one unicellular gene-derived duplicate involved in the cell cycle, DDX11. Genome-wide screening data and knockdown experiments demonstrate that DDX11 is broadly essential across cancer cell lines. Importantly, non-neutral amino acid substitution patterns and increased expression indicate that DDX11 has been under positive selection. Finally, we find that cell cycle-related uPSGs are also preferentially upregulated in the highly proliferative embryonic cerebrum. CONCLUSIONS Consistent with the predictions of the atavism and antagonistic pleiotropy hypotheses, primate-specific genes, especially those PSGs derived from cell cycle-related genes that emerged in unicellular ancestors, contribute to the early proliferation of the human cerebrum at the cost of hitchhiking by similarly highly proliferative cancer cells.
Collapse
Affiliation(s)
- Chenyu Ma
- grid.458458.00000 0004 1792 6416Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Chunyan Li
- grid.64939.310000 0000 9999 1211School of Engineering Medicine, Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), and Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, 100191 China
| | - Huijing Ma
- grid.458458.00000 0004 1792 6416Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China
| | - Daqi Yu
- grid.458458.00000 0004 1792 6416Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Yufei Zhang
- grid.458458.00000 0004 1792 6416Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China ,grid.41156.370000 0001 2314 964XSchool of Life Sciences, Nanjing University, Nanjing, 210093 China
| | - Dan Zhang
- grid.458458.00000 0004 1792 6416Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China
| | - Tianhan Su
- grid.458458.00000 0004 1792 6416Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Jianmin Wu
- grid.412474.00000 0001 0027 0586Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Center for Cancer Bioinformatics, Peking University Cancer Hospital & Institute, Beijing, 100142 China
| | - Xiaoyue Wang
- grid.506261.60000 0001 0706 7839State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Li Zhang
- grid.510934.a0000 0005 0398 4153Chinese Institute for Brain Research, Beijing, 102206 China
| | - Chun-Long Chen
- grid.462584.90000 0004 0367 1475Institut Curie, Université PSL, Sorbonne Université, CNRS UMR3244, Dynamics of Genetic Information, 75005 Paris, France
| | - Yong E. Zhang
- grid.458458.00000 0004 1792 6416Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China ,grid.510934.a0000 0005 0398 4153Chinese Institute for Brain Research, Beijing, 102206 China ,grid.9227.e0000000119573309CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223 China
| |
Collapse
|
28
|
Parikh SB, Houghton C, Van Oss SB, Wacholder A, Carvunis A. Origins, evolution, and physiological implications of de novo genes in yeast. Yeast 2022; 39:471-481. [PMID: 35959631 PMCID: PMC9544372 DOI: 10.1002/yea.3810] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/03/2022] Open
Abstract
De novo gene birth is the process by which new genes emerge in sequences that were previously noncoding. Over the past decade, researchers have taken advantage of the power of yeast as a model and a tool to study the evolutionary mechanisms and physiological implications of de novo gene birth. We summarize the mechanisms that have been proposed to explicate how noncoding sequences can become protein-coding genes, highlighting the discovery of pervasive translation of the yeast transcriptome and its presumed impact on evolutionary innovation. We summarize current best practices for the identification and characterization of de novo genes. Crucially, we explain that the field is still in its nascency, with the physiological roles of most young yeast de novo genes identified thus far still utterly unknown. We hope this review inspires researchers to investigate the true contribution of de novo gene birth to cellular physiology and phenotypic diversity across yeast strains and species.
Collapse
Affiliation(s)
- Saurin B. Parikh
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - S. Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Anne‐Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| |
Collapse
|
29
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
30
|
Song H, Guo Z, Zhang X, Sui J. De novo genes in Arachis hypogaea cv. Tifrunner: systematic identification, molecular evolution, and potential contributions to cultivated peanut. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1081-1095. [PMID: 35748398 DOI: 10.1111/tpj.15875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 06/15/2022] [Accepted: 06/21/2022] [Indexed: 06/15/2023]
Abstract
De novo genes are derived from non-coding sequences, and they can play essential roles in organisms. Cultivated peanut (Arachis hypogaea) is a major oil and protein crop derived from a cross between Arachis duranensis and Arachis ipaensis. However, few de novo genes have been documented in Arachis. Here, we identified 381 de novo genes in A. hypogaea cv. Tifrunner based on comparison with five closely related Arachis species. There are distinct differences in gene expression patterns and gene structures between conserved and de novo genes. The identified de novo genes originated from ancestral sequence regions associated with metabolic and biosynthetic processes, and they were subsequently integrated into existing regulatory networks. De novo paralogs and homoeologs were identified in A. hypogaea cv. Tifrunner. De novo paralogs and homoeologs with conserved expression have mismatching cis-acting elements under normal growth conditions. De novo genes potentially have pluripotent functions in responses to biotic stresses as well as in growth and development based on quantitative trait locus data. This work provides a foundation for future research examining gene birth processes and gene function in Arachis and related taxa.
Collapse
Affiliation(s)
- Hui Song
- Grassland Agri-husbandry Research Center, College of Grassland Science, Qingdao Agricultural University, Qingdao, China
| | - Zhonglong Guo
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Life Sciences and School of Advanced Agricultural Sciences, Peking University, Beijing, China
| | - Xiaojun Zhang
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| | - Jiongming Sui
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| |
Collapse
|
31
|
Eicholt LA, Aubel M, Berk K, Bornberg‐Bauer E, Lange A. Heterologous expression of naturally evolved putative
de novo
proteins with chaperones. Protein Sci 2022; 31:e4371. [PMID: 35900020 PMCID: PMC9278007 DOI: 10.1002/pro.4371] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 05/03/2022] [Accepted: 05/14/2022] [Indexed: 11/23/2022]
Abstract
Over the past decade, evidence has accumulated that new protein‐coding genes can emerge de novo from previously non‐coding DNA. Most studies have focused on large scale computational predictions of de novo protein‐coding genes across a wide range of organisms. In contrast, experimental data concerning the folding and function of de novo proteins are scarce. This might be due to difficulties in handling de novo proteins in vitro, as most are short and predicted to be disordered. Here, we propose a guideline for the effective expression of eukaryotic de novo proteins in Escherichia coli. We used 11 sequences from Drosophila melanogaster and 10 from Homo sapiens, that are predicted de novo proteins from former studies, for heterologous expression. The candidate de novo proteins have varying secondary structure and disorder content. Using multiple combinations of purification tags, E. coli expression strains, and chaperone systems, we were able to increase the number of solubly expressed putative de novo proteins from 30% to 62%. Our findings indicate that the best combination for expressing putative de novo proteins in E. coli is a GST‐tag with T7 Express cells and co‐expressed chaperones. We found that, overall, proteins with higher predicted disorder were easier to express.
Collapse
Affiliation(s)
- Lars A. Eicholt
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| | - Margaux Aubel
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| | - Katrin Berk
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| | - Erich Bornberg‐Bauer
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
- Max Planck‐Institute for Biology Tuebingen Tübingen Germany
| | - Andreas Lange
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| |
Collapse
|
32
|
Cardoso-Silva CB, Aono AH, Mancini MC, Sforça DA, da Silva CC, Pinto LR, Adams KL, de Souza AP. Taxonomically Restricted Genes Are Associated With Responses to Biotic and Abiotic Stresses in Sugarcane ( Saccharum spp.). FRONTIERS IN PLANT SCIENCE 2022; 13:923069. [PMID: 35845637 PMCID: PMC9280035 DOI: 10.3389/fpls.2022.923069] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Orphan genes (OGs) are protein-coding genes that are restricted to particular clades or species and lack homology with genes from other organisms, making their biological functions difficult to predict. OGs can rapidly originate and become functional; consequently, they may support rapid adaptation to environmental changes. Extensive spread of mobile elements and whole-genome duplication occurred in the Saccharum group, which may have contributed to the origin and diversification of OGs in the sugarcane genome. Here, we identified and characterized OGs in sugarcane, examined their expression profiles across tissues and genotypes, and investigated their regulation under varying conditions. We identified 319 OGs in the Saccharum spontaneum genome without detected homology to protein-coding genes in green plants, except those belonging to Saccharinae. Transcriptomic analysis revealed 288 sugarcane OGs with detectable expression levels in at least one tissue or genotype. We observed similar expression patterns of OGs in sugarcane genotypes originating from the closest geographical locations. We also observed tissue-specific expression of some OGs, possibly indicating a complex regulatory process for maintaining diverse functional activity of these genes across sugarcane tissues and genotypes. Sixty-six OGs were differentially expressed under stress conditions, especially cold and osmotic stresses. Gene co-expression network and functional enrichment analyses suggested that sugarcane OGs are involved in several biological mechanisms, including stimulus response and defence mechanisms. These findings provide a valuable genomic resource for sugarcane researchers, especially those interested in selecting stress-responsive genes.
Collapse
Affiliation(s)
- Cláudio Benício Cardoso-Silva
- Center of Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
- Department of Botany, University of British Columbia, Vancouver, BC, Canada
| | - Alexandre Hild Aono
- Center of Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
| | - Melina Cristina Mancini
- Center of Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
| | - Danilo Augusto Sforça
- Center of Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
| | - Carla Cristina da Silva
- Center of Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
- Agronomy Department, Federal University of Viçosa (UFV), Viçosa, Brazil
| | - Luciana Rossini Pinto
- Sugarcane Research Advanced Centre, Agronomic Institute of Campinas (IAC/APTA), Ribeirão Preto, Brazil
| | - Keith L. Adams
- Department of Botany, University of British Columbia, Vancouver, BC, Canada
| | - Anete Pereira de Souza
- Center of Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
- Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
| |
Collapse
|
33
|
Kariñho-Betancourt E, Carlson D, Hollister J, Fischer A, Greiner S, Johnson MTJ. The evolution of multi-gene families and metabolic pathways in the evening primroses (Oenothera: Onagraceae): A comparative transcriptomics approach. PLoS One 2022; 17:e0269307. [PMID: 35749399 PMCID: PMC9231714 DOI: 10.1371/journal.pone.0269307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 05/18/2022] [Indexed: 12/02/2022] Open
Abstract
The plant genus Oenothera has played an important role in the study of plant evolution of genomes and plant defense and reproduction. Here, we build on the 1kp transcriptomic dataset by creating 44 new transcriptomes and analyzing a total of 63 transcriptomes to present a large-scale comparative study across 29 Oenothera species. Our dataset included 30.4 million reads per individual and 2.3 million transcripts on average. We used this transcriptome resource to examine genome-wide evolutionary patterns and functional diversification by searching for orthologous genes and performed gene family evolution analysis. We found wide heterogeneity in gene family evolution across the genus, with section Oenothera exhibiting the most pronounced evolutionary changes. Overall, more significant gene family expansions occurred than contractions. We also analyzed the molecular evolution of phenolic metabolism by retrieving proteins annotated for phenolic enzymatic complexes. We identified 1,568 phenolic genes arranged into 83 multigene families that varied widely across the genus. All taxa experienced rapid phenolic evolution (fast rate of genomic turnover) involving 33 gene families, which exhibited large expansions, gaining about 2-fold more genes than they lost. Upstream enzymes phenylalanine ammonia-lyase (PAL) and 4-coumaroyl: CoA ligase (4CL) accounted for most of the significant expansions and contractions. Our results suggest that adaptive and neutral evolutionary processes have contributed to Oenothera diversification and rapid gene family evolution.
Collapse
Affiliation(s)
- Eunice Kariñho-Betancourt
- Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
- * E-mail: (EKB); (MTJJ)
| | - David Carlson
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, United States of America
| | - Jessie Hollister
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, United States of America
| | - Axel Fischer
- Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany
| | - Stephan Greiner
- Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany
| | - Marc T. J. Johnson
- Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
- * E-mail: (EKB); (MTJJ)
| |
Collapse
|
34
|
Huang Y, Shang R, Lu GA, Zeng W, Huang C, Zou C, Tang T. Spatiotemporal Regulation of a Single Adaptively Evolving Trans-Regulatory Element Contributes to Spermatogenetic Expression Divergence in Drosophila. Mol Biol Evol 2022; 39:6605656. [PMID: 35687719 PMCID: PMC9254010 DOI: 10.1093/molbev/msac127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Due to extensive pleiotropy, trans-acting elements are often thought to be evolutionarily constrained. While the impact of trans-acting elements on gene expression evolution has been extensively studied, relatively little is understood about the contribution of a single trans regulator to interspecific expression and phenotypic divergence. Here, we disentangle the effects of genomic context and miR-983, an adaptively evolving young microRNA, on expression divergence between Drosophila melanogaster and D. simulans. We show miR-983 effects promote interspecific expression divergence in testis despite its antagonism with the often-predominant context effects. Single-cyst RNA-seq reveals that distinct sets of genes gain and lose miR-983 influence under disruptive or diversifying selection at different stages of spermatogenesis, potentially helping minimize antagonistic pleiotropy. At the round spermatid stage, the effects of miR-983 are weak and distributed, coincident with the transcriptome undergoing drastic expression changes. Knocking out miR-983 causes reduced sperm length with increased within-individual variation in D. melanogaster but not in D. simulans, and the D. melanogaster knockout also exhibits compromised sperm defense ability. Our results provide empirical evidence for the resolution of antagonistic pleiotropy and also have broad implications for the function and evolution of new trans regulators.
Collapse
Affiliation(s)
- Yumei Huang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, 510275 Guangzhou, Guangdong Province, China
| | - Rui Shang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, 510275 Guangzhou, Guangdong Province, China
| | - Guang-An Lu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, 510275 Guangzhou, Guangdong Province, China
| | - Weishun Zeng
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, 510275 Guangzhou, Guangdong Province, China
| | - Chenglong Huang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, 510275 Guangzhou, Guangdong Province, China
| | - Chuangchao Zou
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, 510275 Guangzhou, Guangdong Province, China
| | - Tian Tang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, 510275 Guangzhou, Guangdong Province, China
| |
Collapse
|
35
|
Suenaga Y, Kato M, Nagai M, Nakatani K, Kogashi H, Kobatake M, Makino T. Open reading frame dominance indicates protein‐coding potential of RNAs. EMBO Rep 2022; 23:e54321. [PMID: 35438231 PMCID: PMC9171421 DOI: 10.15252/embr.202154321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Recent studies have identified numerous RNAs with both coding and noncoding functions. However, the sequence characteristics that determine this bifunctionality remain largely unknown. In the present study, we develop and test the open reading frame (ORF) dominance score, which we define as the fraction of the longest ORF in the sum of all putative ORF lengths. This score correlates with translation efficiency in coding transcripts and with translation of noncoding RNAs. In bacteria and archaea, coding and noncoding transcripts have narrow distributions of high and low ORF dominance, respectively, whereas those of eukaryotes show relatively broader ORF dominance distributions, with considerable overlap between coding and noncoding transcripts. The extent of overlap positively and negatively correlates with the mutation rate of genomes and the effective population size of species, respectively. Tissue‐specific transcripts show higher ORF dominance than ubiquitously expressed transcripts, and the majority of tissue‐specific transcripts are expressed in mature testes. These data suggest that the decrease in population size and the emergence of testes in eukaryotic organisms allowed for the evolution of potentially bifunctional RNAs.
Collapse
Affiliation(s)
- Yusuke Suenaga
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
| | - Mamoru Kato
- Division of Bioinformatics National Cancer Centre Research Institute Tokyo Japan
| | - Momoko Nagai
- Division of Bioinformatics National Cancer Centre Research Institute Tokyo Japan
| | - Kazuma Nakatani
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
- Department of Molecular Biology and Oncology Chiba University School of Medicine Chiba Japan
- Innovative Medicine CHIBA Doctoral WISE Program Chiba University School of Medicine Chiba Japan
| | - Hiroyuki Kogashi
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
- Department of Molecular Biology and Oncology Chiba University School of Medicine Chiba Japan
| | - Miho Kobatake
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
| | - Takashi Makino
- Laboratory of Evolutionary Genomics Graduate School of Life Sciences Tohoku University Sendai Japan
| |
Collapse
|
36
|
Weisman CM, Murray AW, Eddy SR. Mixing genome annotation methods in a comparative analysis inflates the apparent number of lineage-specific genes. Curr Biol 2022; 32:2632-2639.e2. [PMID: 35588743 DOI: 10.1016/j.cub.2022.04.085] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/17/2022] [Accepted: 04/21/2022] [Indexed: 12/16/2022]
Abstract
Comparisons of genomes of different species are used to identify lineage-specific genes, those genes that appear unique to one species or clade. Lineage-specific genes are often thought to represent genetic novelty that underlies unique adaptations. Identification of these genes depends not only on genome sequences, but also on inferred gene annotations. Comparative analyses typically use available genomes that have been annotated using different methods, increasing the risk that orthologous DNA sequences may be erroneously annotated as a gene in one species but not another, appearing lineage specific as a result. To evaluate the impact of such "annotation heterogeneity," we identified four clades of species with sequenced genomes with more than one publicly available gene annotation, allowing us to compare the number of lineage-specific genes inferred when differing annotation methods are used to those resulting when annotation method is uniform across the clade. In these case studies, annotation heterogeneity increases the apparent number of lineage-specific genes by up to 15-fold, suggesting that annotation heterogeneity is a substantial source of potential artifact.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, South Drive, Princeton, NJ 08540, USA.
| | - Andrew W Murray
- Department of Molecular & Cellular Biology, Harvard University, Divinity Avenue, Cambridge, MA 02138, USA
| | - Sean R Eddy
- Department of Molecular & Cellular Biology, Harvard University, Divinity Avenue, Cambridge, MA 02138, USA; Howard Hughes Medical Institute, Jones Bridge Road, Chevy Chase, MD 20815, USA; John A. Paulson School of Engineering and Applied Sciences, Harvard University, Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
37
|
Leong AZX, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures. J Biomed Sci 2022; 29:19. [PMID: 35300685 PMCID: PMC8928697 DOI: 10.1186/s12929-022-00802-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/09/2022] [Indexed: 12/17/2022] Open
Abstract
A short open reading frame (sORFs) constitutes ≤ 300 bases, encoding a microprotein or sORF-encoded protein (SEP) which comprises ≤ 100 amino acids. Traditionally dismissed by genome annotation pipelines as meaningless noise, sORFs were found to possess coding potential with ribosome profiling (RIBO-Seq), which unveiled sORF-based transcripts at various genome locations. Nonetheless, the existence of corresponding microproteins that are stable and functional was little substantiated by experimental evidence initially. With recent advancements in multi-omics, the identification, validation, and functional characterisation of sORFs and microproteins have become feasible. In this review, we discuss the history and development of an emerging research field of sORFs and microproteins. In particular, we focus on an array of bioinformatics and OMICS approaches used for predicting, sequencing, validating, and characterizing these recently discovered entities. These strategies include RIBO-Seq which detects sORF transcripts via ribosome footprints, and mass spectrometry (MS)-based proteomics for sequencing the resultant microproteins. Subsequently, our discussion extends to the functional characterisation of microproteins by incorporating CRISPR/Cas9 screen and protein–protein interaction (PPI) studies. Our review discusses not only detection methodologies, but we also highlight on the challenges and potential solutions in identifying and validating sORFs and their microproteins. The novelty of this review lies within its validation for the functional role of microproteins, which could contribute towards the future landscape of microproteomics.
Collapse
Affiliation(s)
- Alyssa Zi-Xin Leong
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Pey Yee Lee
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - M Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Saiful Effendi Syafruddin
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Yuh-Fen Pung
- Division of Biomedical Science, School of Pharmacy, University of Nottingham Malaysia, Semenyih, 43500, Selangor, Malaysia
| | - Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia.
| |
Collapse
|
38
|
Nikolaidis M, Markoulatos P, Van de Peer Y, Oliver SG, Amoutzias GD. The Neighborhood of the Spike Gene Is a Hotspot for Modular Intertypic Homologous and Nonhomologous Recombination in Coronavirus Genomes. Mol Biol Evol 2022; 39:msab292. [PMID: 34638137 PMCID: PMC8549283 DOI: 10.1093/molbev/msab292] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Coronaviruses (CoVs) have very large RNA viral genomes with a distinct genomic architecture of core and accessory open reading frames (ORFs). It is of utmost importance to understand their patterns and limits of homologous and nonhomologous recombination, because such events may affect the emergence of novel CoV strains, alter their host range, infection rate, tissue tropism pathogenicity, and their ability to escape vaccination programs. Intratypic recombination among closely related CoVs of the same subgenus has often been reported; however, the patterns and limits of genomic exchange between more distantly related CoV lineages (intertypic recombination) need further investigation. Here, we report computational/evolutionary analyses that clearly demonstrate a substantial ability for CoVs of different subgenera to recombine. Furthermore, we show that CoVs can obtain-through nonhomologous recombination-accessory ORFs from core ORFs, exchange accessory ORFs with different CoV genera, with other viruses (i.e., toroviruses, influenza C/D, reoviruses, rotaviruses, astroviruses) and even with hosts. Intriguingly, most of these radical events result from double crossovers surrounding the Spike ORF, thus highlighting both the instability and mobile nature of this genomic region. Although many such events have often occurred during the evolution of various CoVs, the genomic architecture of the relatively young SARS-CoV/SARS-CoV-2 lineage so far appears to be stable.
Collapse
Affiliation(s)
- Marios Nikolaidis
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, Larissa, Greece
| | - Panayotis Markoulatos
- Microbial Biotechnology-Molecular Bacteriology-Virology Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, Larissa, Greece
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
- College of Horticulture, Nanjing Agricultural University, Nanjing, China
| | - Stephen G Oliver
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Grigorios D Amoutzias
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, Larissa, Greece
| |
Collapse
|
39
|
Cherezov RO, Vorontsova JE, Simonova OB. The Phenomenon of Evolutionary “De Novo Generation” of Genes. Russ J Dev Biol 2021. [DOI: 10.1134/s1062360421060035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
40
|
Lee J, Wacholder A, Carvunis AR. Evolutionary Characterization of the Short Protein SPAAR. Genes (Basel) 2021; 12:genes12121864. [PMID: 34946813 PMCID: PMC8702040 DOI: 10.3390/genes12121864] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Revised: 11/22/2021] [Accepted: 11/22/2021] [Indexed: 02/07/2023] Open
Abstract
Microproteins (<100 amino acids) are receiving increasing recognition as important participants in numerous biological processes, but their evolutionary dynamics are poorly understood. SPAAR is a recently discovered microprotein that regulates muscle regeneration and angiogenesis through interactions with conserved signaling pathways. Interestingly, SPAAR does not belong to any known protein family and has known homologs exclusively among placental mammals. This lack of distant homology could be caused by challenges in homology detection of short sequences, or it could indicate a recent de novo emergence from a noncoding sequence. By integrating syntenic alignments and homology searches, we identify SPAAR orthologs in marsupials and monotremes, establishing that SPAAR has existed at least since the emergence of mammals. SPAAR shows substantial primary sequence divergence but retains a conserved protein structure. In primates, we infer two independent evolutionary events leading to the de novo origination of 5' elongated isoforms of SPAAR from a noncoding sequence and find evidence of adaptive evolution in this extended region. Thus, SPAAR may be of ancient origin, but it appears to be experiencing continual evolutionary innovation in mammals.
Collapse
Affiliation(s)
- Jiwon Lee
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; (J.L.); (A.W.)
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Joint CMU-Pitt Ph.D. Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; (J.L.); (A.W.)
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; (J.L.); (A.W.)
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Correspondence: ; Tel.: +1-412-648-3335
| |
Collapse
|
41
|
Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021; 39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
Collapse
Affiliation(s)
- Andrew K Watson
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Eric Bapteste
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| |
Collapse
|
42
|
Lyu Y, Liufu Z, Xiao J, Tang T. A Rapid Evolving microRNA Cluster Rewires Its Target Regulatory Networks in Drosophila. Front Genet 2021; 12:760530. [PMID: 34777478 PMCID: PMC8581666 DOI: 10.3389/fgene.2021.760530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
New miRNAs are evolutionarily important but their functional evolution remains unclear. Here we report that the evolution of a microRNA cluster, mir-972C rewires its downstream regulatory networks in Drosophila. Genomic analysis reveals that mir-972C originated in the common ancestor of Drosophila where it comprises six old miRNAs. It has subsequently recruited six new members in the melanogaster subgroup after evolving for at least 50 million years. Both the young and the old mir-972C members evolved rapidly in seed and non-seed regions. Combining target prediction and cell transfection experiments, we found that the seed and non-seed changes in individual mir-972C members cause extensive target divergence among D. melanogaster, D. simulans, and D. virilis, consistent with the functional evolution of mir-972C reported recently. Intriguingly, the target pool of the cluster as a whole remains relatively conserved. Our results suggest that clustering of young and old miRNAs broadens the target repertoires by acquiring new targets without losing many old ones. This may facilitate the establishment of new miRNAs in existing regulatory networks.
Collapse
Affiliation(s)
- Yang Lyu
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zhongqi Liufu
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Juan Xiao
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Tian Tang
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
43
|
Zhuang X, Cheng CHC. Propagation of a De Novo Gene under Natural Selection: Antifreeze Glycoprotein Genes and Their Evolutionary History in Codfishes. Genes (Basel) 2021; 12:genes12111777. [PMID: 34828383 PMCID: PMC8622921 DOI: 10.3390/genes12111777] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 11/08/2021] [Accepted: 11/08/2021] [Indexed: 11/16/2022] Open
Abstract
The de novo birth of functional genes from non-coding DNA as an important contributor to new gene formation is increasingly supported by evidence from diverse eukaryotic lineages. However, many uncertainties remain, including how the incipient de novo genes would continue to evolve and the molecular mechanisms underlying their evolutionary trajectory. Here we address these questions by investigating evolutionary history of the de novo antifreeze glycoprotein (AFGP) gene and gene family in gadid (codfish) lineages. We examined AFGP phenotype on a phylogenetic framework encompassing a broad sampling of gadids from freezing and non-freezing habitats. In three select species representing different AFGP-bearing clades, we analyzed all AFGP gene family members and the broader scale AFGP genomic regions in detail. Codon usage analyses suggest that motif duplication produced the intragenic AFGP tripeptide coding repeats, and rapid sequence divergence post-duplication stabilized the recombination-prone long repetitive coding region. Genomic loci analyses support AFGP originated once from a single ancestral genomic origin, and shed light on how the de novo gene proliferated into a gene family. Results also show the processes of gene duplication and gene loss are distinctive in separate clades, and both genotype and phenotype are commensurate with differential local selective pressures.
Collapse
Affiliation(s)
- Xuan Zhuang
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
- Correspondence: (X.Z.); (C.-H.C.C.)
| | - C.-H. Christina Cheng
- Department of Evolution, Ecology, and Behavior, University of Illinois, Urbana-Champaign, IL 61801, USA
- Correspondence: (X.Z.); (C.-H.C.C.)
| |
Collapse
|
44
|
Matsuo T, Nakatani K, Setoguchi T, Matsuo K, Tamada T, Suenaga Y. Secondary Structure of Human De Novo Evolved Gene Product NCYM Analyzed by Vacuum-Ultraviolet Circular Dichroism. Front Oncol 2021; 11:688852. [PMID: 34497756 PMCID: PMC8420857 DOI: 10.3389/fonc.2021.688852] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/31/2021] [Indexed: 11/29/2022] Open
Abstract
NCYM, a cis-antisense gene of MYCN, encodes a Homininae-specific protein that promotes the aggressiveness of human tumors. Newly evolved genes from non-genic regions are known as de novo genes, and NCYM was the first de novo gene whose oncogenic functions were validated in vivo. Targeting NCYM using drugs is a potential strategy for cancer therapy; however, the NCYM structure must be determined before drug design. In this study, we employed vacuum-ultraviolet circular dichroism to evaluate the secondary structure of NCYM. The SUMO-tagged NCYM and the isolated SUMO tag in both hydrogenated and perdeuterated forms were synthesized and purified in a cell-free in vitro system, and vacuum-ultraviolet circular dichroism spectra were measured. Significant differences between the tagged NCYM and the isolated tag were evident in the wavelength range of 190–240 nm. The circular dichroism spectral data combined with a neural network system enabled to predict the secondary structure of NCYM at the amino acid level. The 129-residue tag consists of α-helices (approximately 14%) and β-strands (approximately 29%), which corresponded to the values calculated from the atomic structure of the tag. The 238-residue tagged NCYM contained approximately 17% α-helices and 27% β-strands. The location of the secondary structure predicted using the neural network revealed that these secondary structures were enriched in the Homininae-specific region of NCYM. Deuteration of NCYM altered the secondary structure at D90 from an α-helix to another structure other than α-helix and β-strand although this change was within the experimental error range. All four nonsynonymous single-nucleotide polymorphisms (SNPs) in human populations were in this region, and the amino acid alteration in SNP N52S enhanced Myc-nick production. The D90N mutation in NCYM promoted NCYM-mediated MYCN stabilization. Our results reveal the secondary structure of NCYM and demonstrated that the Homininae-specific domain of NCYM is responsible for MYCN stabilization.
Collapse
Affiliation(s)
- Tatsuhito Matsuo
- Institute for Quantum Life Science, National Institutes for Quantum and Radiological Science and Technology, Ibaraki, Japan
| | - Kazuma Nakatani
- Department of Molecular Carcinogenesis, Chiba Cancer Center Research Institute, Chiba, Japan.,Graduate School of Medical and Pharmaceutical Sciences, Chiba University, Chiba, Japan.,Innovative Medicine CHIBA Doctoral World-leading Innovative & Smart Education (WISE) Program, Chiba University, Chiba, Japan
| | - Taiki Setoguchi
- Department of Molecular Carcinogenesis, Chiba Cancer Center Research Institute, Chiba, Japan.,Department of Neurosurgery, Chiba Cancer Center, Chiba, Japan
| | - Koichi Matsuo
- Hiroshima Synchrotron Radiation Center, Hiroshima University, Hiroshima, Japan
| | - Taro Tamada
- Institute for Quantum Life Science, National Institutes for Quantum and Radiological Science and Technology, Ibaraki, Japan
| | - Yusuke Suenaga
- Department of Molecular Carcinogenesis, Chiba Cancer Center Research Institute, Chiba, Japan
| |
Collapse
|
45
|
Rivard EL, Ludwig AG, Patel PH, Grandchamp A, Arnold SE, Berger A, Scott EM, Kelly BJ, Mascha GC, Bornberg-Bauer E, Findlay GD. A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster. PLoS Genet 2021; 17:e1009787. [PMID: 34478447 PMCID: PMC8445463 DOI: 10.1371/journal.pgen.1009787] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/16/2021] [Accepted: 08/19/2021] [Indexed: 02/07/2023] Open
Abstract
Comparative genomics has enabled the identification of genes that potentially evolved de novo from non-coding sequences. Many such genes are expressed in male reproductive tissues, but their functions remain poorly understood. To address this, we conducted a functional genetic screen of over 40 putative de novo genes with testis-enriched expression in Drosophila melanogaster and identified one gene, atlas, required for male fertility. Detailed genetic and cytological analyses showed that atlas is required for proper chromatin condensation during the final stages of spermatogenesis. Atlas protein is expressed in spermatid nuclei and facilitates the transition from histone- to protamine-based chromatin packaging. Complementary evolutionary analyses revealed the complex evolutionary history of atlas. The protein-coding portion of the gene likely arose at the base of the Drosophila genus on the X chromosome but was unlikely to be essential, as it was then lost in several independent lineages. Within the last ~15 million years, however, the gene moved to an autosome, where it fused with a conserved non-coding RNA and evolved a non-redundant role in male fertility. Altogether, this study provides insight into the integration of novel genes into biological processes, the links between genomic innovation and functional evolution, and the genetic control of a fundamental developmental process, gametogenesis.
Collapse
Affiliation(s)
- Emily L. Rivard
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Andrew G. Ludwig
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Prajal H. Patel
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Sarah E. Arnold
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Emilie M. Scott
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Brendan J. Kelly
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Grace C. Mascha
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Geoffrey D. Findlay
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| |
Collapse
|
46
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
47
|
Genomic analyses of new genes and their phenotypic effects reveal rapid evolution of essential functions in Drosophila development. PLoS Genet 2021; 17:e1009654. [PMID: 34242211 PMCID: PMC8270118 DOI: 10.1371/journal.pgen.1009654] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 06/09/2021] [Indexed: 12/27/2022] Open
Abstract
It is a conventionally held dogma that the genetic basis underlying development is conserved in a long evolutionary time scale. Ample experiments based on mutational, biochemical, functional, and complementary knockdown/knockout approaches have revealed the unexpectedly important role of recently evolved new genes in the development of Drosophila. The recent progress in the genome-wide experimental testing of gene effects and improvements in the computational identification of new genes (< 40 million years ago, Mya) open the door to investigate the evolution of gene essentiality with a phylogenetically high resolution. These advancements also raised interesting issues in techniques and concepts related to phenotypic effect analyses of genes, particularly of those that recently originated. Here we reported our analyses of these issues, including reproducibility and efficiency of knockdown experiment and difference between RNAi libraries in the knockdown efficiency and testing of phenotypic effects. We further analyzed a large data from knockdowns of 11,354 genes (~75% of the Drosophila melanogaster total genes), including 702 new genes (~66% of the species total new genes that aged < 40 Mya), revealing a similarly high proportion (~32.2%) of essential genes that originated in various Sophophora subgenus lineages and distant ancestors beyond the Drosophila genus. The transcriptional compensation effect from CRISPR knockout were detected for highly similar duplicate copies. Knockout of a few young genes detected analogous essentiality in various functions in development. Taken together, our experimental and computational analyses provide valuable data for detection of phenotypic effects of genes in general and further strong evidence for the concept that new genes in Drosophila quickly evolved essential functions in viability during development.
Collapse
|
48
|
Rödelsperger C, Ebbing A, Sharma DR, Okumura M, Sommer RJ, Korswagen HC. Spatial Transcriptomics of Nematodes Identifies Sperm Cells as a Source of Genomic Novelty and Rapid Evolution. Mol Biol Evol 2021; 38:229-243. [PMID: 32785688 PMCID: PMC8480184 DOI: 10.1093/molbev/msaa207] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Divergence of gene function and expression during development can give rise to phenotypic differences at the level of cells, tissues, organs, and ultimately whole organisms. To gain insights into the evolution of gene expression and novel genes at spatial resolution, we compared the spatially resolved transcriptomes of two distantly related nematodes, Caenorhabditis elegans and Pristionchus pacificus, that diverged 60–90 Ma. The spatial transcriptomes of adult worms show little evidence for strong conservation at the level of single genes. Instead, regional expression is largely driven by recent duplication and emergence of novel genes. Estimation of gene ages across anatomical structures revealed an enrichment of novel genes in sperm-related regions. This provides first evidence in nematodes for the “out of testis” hypothesis that has been previously postulated based on studies in Drosophila and mammals. “Out of testis” genes represent a mix of products of pervasive transcription as well as fast evolving members of ancient gene families. Strikingly, numerous novel genes have known functions during meiosis in Caenorhabditis elegans indicating that even universal processes such as meiosis may be targets of rapid evolution. Our study highlights the importance of novel genes in generating phenotypic diversity and explicitly characterizes gene origination in sperm-related regions. Furthermore, it proposes new functions for previously uncharacterized genes and establishes the spatial transcriptome of Pristionchus pacificus as a catalog for future studies on the evolution of gene expression and function.
Collapse
Affiliation(s)
- Christian Rödelsperger
- Department for Integrative Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Annabel Ebbing
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences and University Medical Center Utrecht, Utrecht, The Netherlands
| | - Devansh Raj Sharma
- Department for Integrative Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Misako Okumura
- Program of Biomedical Science, Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Hiroshima, Japan
| | - Ralf J Sommer
- Department for Integrative Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Hendrik C Korswagen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences and University Medical Center Utrecht, Utrecht, The Netherlands.,Developmental Biology, Department of Biology, Institute of Biodynamics and Biocomplexity, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
49
|
Hata T, Takada N, Hayakawa C, Kazama M, Uchikoba T, Tachikawa M, Matsuo M, Satoh S, Obokata J. De novo activated transcription of inserted foreign coding sequences is inheritable in the plant genome. PLoS One 2021; 16:e0252674. [PMID: 34111139 PMCID: PMC8191969 DOI: 10.1371/journal.pone.0252674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 05/19/2021] [Indexed: 01/16/2023] Open
Abstract
The manner in which inserted foreign coding sequences become transcriptionally activated and fixed in the plant genome is poorly understood. To examine such processes of gene evolution, we performed an artificial evolutionary experiment in Arabidopsis thaliana. As a model of gene-birth events, we introduced a promoterless coding sequence of the firefly luciferase (LUC) gene and established 386 T2-generation transgenic lines. Among them, we determined the individual LUC insertion loci in 76 lines and found that one-third of them were transcribed de novo even in the intergenic or inherently unexpressed regions. In the transcribed lines, transcription-related chromatin marks were detected across the newly activated transcribed regions. These results agreed with our previous findings in A. thaliana cultured cells under a similar experimental scheme. A comparison of the results of the T2-plant and cultured cell experiments revealed that the de novo-activated transcription concomitant with local chromatin remodelling was inheritable. During one-generation inheritance, it seems likely that the transcription activities of the LUC inserts trapped by the endogenous genes/transcripts became stronger, while those of de novo transcription in the intergenic/untranscribed regions became weaker. These findings may offer a clue for the elucidation of the mechanism by which inserted foreign coding sequences become transcriptionally activated and fixed in the plant genome.
Collapse
Affiliation(s)
- Takayuki Hata
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
- Faculty of Agriculture, Setsunan University, Hirakata-shi, Osaka, Japan
| | - Naoto Takada
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Chihiro Hayakawa
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Mei Kazama
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Tomohiro Uchikoba
- Faculty of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Makoto Tachikawa
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Mitsuhiro Matsuo
- Faculty of Agriculture, Setsunan University, Hirakata-shi, Osaka, Japan
| | - Soichirou Satoh
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
- Faculty of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Junichi Obokata
- Faculty of Agriculture, Setsunan University, Hirakata-shi, Osaka, Japan
| |
Collapse
|
50
|
Zhao Y, Lu GA, Yang H, Lin P, Liufu Z, Tang T, Xu J. Run or Die in the Evolution of New MicroRNAs-Testing the Red Queen Hypothesis on De Novo New Genes. Mol Biol Evol 2021; 38:1544-1553. [PMID: 33306129 PMCID: PMC8042761 DOI: 10.1093/molbev/msaa317] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The Red Queen hypothesis depicts evolution as the continual struggle to adapt. According to this hypothesis, new genes, especially those originating from nongenic sequences (i.e., de novo genes), are eliminated unless they evolve continually in adaptation to a changing environment. Here, we analyze two Drosophila de novo miRNAs that are expressed in a testis-specific manner with very high rates of evolution in their DNA sequence. We knocked out these miRNAs in two sibling species and investigated their contributions to different fitness components. We observed that the fitness contributions of miR-975 in Drosophila simulans seem positive, in contrast to its neutral contributions in D. melanogaster, whereas miR-983 appears to have negative contributions in both species, as the fitness of the knockout mutant increases. As predicted by the Red Queen hypothesis, the fitness difference of these de novo miRNAs indicates their different fates.
Collapse
Affiliation(s)
- Yixin Zhao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Guang-An Lu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Hao Yang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Pei Lin
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Zhongqi Liufu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Tian Tang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
| |
Collapse
|