1
|
Rich A, Acar O, Carvunis AR. Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome. Genome Biol 2024; 25:183. [PMID: 38978079 PMCID: PMC11232214 DOI: 10.1186/s13059-024-03287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 05/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.
Collapse
Affiliation(s)
- April Rich
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Omer Acar
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
uz-Zaman MH, D’Alton S, Barrick JE, Ochman H. Promoter recruitment drives the emergence of proto-genes in a long-term evolution experiment with Escherichia coli. PLoS Biol 2024; 22:e3002418. [PMID: 38713714 PMCID: PMC11101190 DOI: 10.1371/journal.pbio.3002418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/17/2024] [Accepted: 04/18/2024] [Indexed: 05/09/2024] Open
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli long-term evolution experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, with levels of transcription across low-expressed regions increasing in later generations of the experiment. Proto-genes formed downstream of new mutations result either from insertion element activity or chromosomal translocations that fused preexisting regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter, although such cases were rare compared to those caused by recruitment of preexisting promoters. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, can persist stably, and can serve as potential substrates for new gene formation.
Collapse
Affiliation(s)
- Md. Hassan uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Simon D’Alton
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
3
|
Singh AK. Rules and impacts of nonsense-mediated mRNA decay in the degradation of long noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1853. [PMID: 38741356 DOI: 10.1002/wrna.1853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/16/2024]
Abstract
Nonsense-mediated mRNA decay (NMD) is a quality-control process that selectively degrades mRNAs having premature termination codon, upstream open reading frame, or unusually long 3'UTR. NMD detects such mRNAs and rapidly degrades them during initial rounds of translation in the eukaryotic cells. Since NMD is a translation-dependent cytoplasmic mRNA surveillance process, the noncoding RNAs were initially believed to be NMD-resistant. The sequence feature-based analysis has revealed that many putative long noncoding RNAs (lncRNAs) have short open reading frames, most of which have translation potential. Subsequent transcriptome-based molecular studies showed an association of a large set of such putative lncRNAs with translating ribosomes, and some of them produce stable and functionally active micropeptides. The translationally active lncRNAs typically have relatively longer and unprotected 3'UTR, which can induce their NMD-dependent degradation. This review defines the mechanism and regulation of NMD-dependent degradation of lncRNAs and its impact on biological processes related to the functions of lncRNAs or their encoded micropeptides. This article is categorized under: RNA Turnover and Surveillance > Turnover/Surveillance Mechanisms RNA Turnover and Surveillance > Regulation of RNA Stability RNA in Disease and Development > RNA in Disease.
Collapse
Affiliation(s)
- Anand Kumar Singh
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati, Andhra Pradesh, India
| |
Collapse
|
4
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
5
|
Lu Y, Ran Y, Li H, Wen J, Cui X, Zhang X, Guan X, Cheng M. Micropeptides: origins, identification, and potential role in metabolism-related diseases. J Zhejiang Univ Sci B 2023; 24:1106-1122. [PMID: 38057268 PMCID: PMC10710913 DOI: 10.1631/jzus.b2300128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 06/06/2023] [Indexed: 12/08/2023]
Abstract
With the development of modern sequencing techniques and bioinformatics, genomes that were once thought to be noncoding have been found to encode abundant functional micropeptides (miPs), a kind of small polypeptides. Although miPs are difficult to analyze and identify, a number of studies have begun to focus on them. More and more miPs have been revealed as essential for energy metabolism homeostasis, immune regulation, and tumor growth and development. Many reports have shown that miPs are especially essential for regulating glucose and lipid metabolism and regulating mitochondrial function. MiPs are also involved in the progression of related diseases. This paper reviews the sources and identification of miPs, as well as the functional significance of miPs for metabolism-related diseases, with the aim of revealing their potential clinical applications.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Min Cheng
- School of Basic Medicine Sciences, Weifang Medical University, Weifang 261053, China.
| |
Collapse
|
6
|
Uz-Zaman MH, D'Alton S, Barrick JE, Ochman H. Promoter capture drives the emergence of proto-genes in Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567300. [PMID: 38013999 PMCID: PMC10680751 DOI: 10.1101/2023.11.15.567300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli Long-Term Evolution Experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time-span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, thereby serving as raw material for new gene emergence. Most proto-genes result either from insertion element activity or chromosomal translocations that fused pre-existing regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, persist stably, and can serve as potential substrates for new gene formation.
Collapse
|
7
|
Rizavi HS, Gavin HE, Krishnan HR, Gavin DP, Sharma RP. Ethanol- and PARP-Mediated Regulation of Ribosome-Associated Long Non-Coding RNA (lncRNA) in Pyramidal Neurons. Noncoding RNA 2023; 9:72. [PMID: 37987368 PMCID: PMC10661276 DOI: 10.3390/ncrna9060072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/23/2023] [Accepted: 11/03/2023] [Indexed: 11/22/2023] Open
Abstract
Although, by definition, long noncoding RNAs (lncRNAs) are not translated, they are sometimes associated with ribosomes. In fact, some estimates suggest the existence of more than 50 K lncRNA molecules that could encode for small peptides. We examined the effects of an ethanol and Poly-ADP Ribose Polymerase (PARP) inhibitor (ABT-888) on ribosome-bound lncRNAs. Mice were administered via intraperitoneal injection (i.p.) either normal saline (CTL) or ethanol (EtOH) twice a day for four consecutive days. On the fourth day, a sub-group of mice administered with ethanol also received ABT-888 (EtOH+ABT). Ribosome-bound lncRNAs in CaMKIIα-expressing pyramidal neurons were measured using the Translating Ribosome Affinity Purification (TRAP) technique. Our findings show that EtOH altered the attachment of 107 lncRNA transcripts, while EtOH+ABT altered 60 lncRNAs. Among these 60 lncRNAs, 49 were altered by both conditions, while EtOH+ABT uniquely altered the attachment of 11 lncRNA transcripts that EtOH alone did not affect. To validate these results, we selected eight lncRNAs (Mir124-2hg, 5430416N02Rik, Snhg17, Snhg12, Snhg1, Mir9-3hg, Gas5, and 1110038B12Rik) for qRT-PCR analysis. The current study demonstrates that ethanol-induced changes in lncRNA attachment to ribosomes can be mitigated by the addition of the PARP inhibitor ABT-888.
Collapse
Affiliation(s)
- Hooriyah S. Rizavi
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA; (H.S.R.); (H.E.G.)
- Jesse Brown Veterans Affairs Medical Center, Chicago, IL 60612, USA
| | - Hannah E. Gavin
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA; (H.S.R.); (H.E.G.)
| | - Harish R. Krishnan
- Center for Alcohol Research in Epigenetics, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA;
| | - David P. Gavin
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA; (H.S.R.); (H.E.G.)
| | - Rajiv P. Sharma
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA; (H.S.R.); (H.E.G.)
- Jesse Brown Veterans Affairs Medical Center, Chicago, IL 60612, USA
| |
Collapse
|
8
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
9
|
Han X, Li B, Zhang S. MIR503HG: A potential diagnostic and therapeutic target in human diseases. Biomed Pharmacother 2023; 160:114314. [PMID: 36736276 DOI: 10.1016/j.biopha.2023.114314] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 01/20/2023] [Accepted: 01/26/2023] [Indexed: 02/05/2023] Open
Abstract
LncRNAs are involved in many physiological and pathological processes, including chromatin remodeling, transcription, posttranscriptional gene expression, mRNA stability, translation, and posttranslational modification, and their functions depend on subcellular localization. MIR503HG is a lncRNA as well as a host gene for the miRNAs miR-503 and miR-424. MIR503HG functions independently or synergistically with miR-503. MIR503HG affects cell proliferation, invasion, metastasis, apoptosis, angiogenesis, and other biological behaviors. The mechanism of MIR503HG in disease includes interaction with protein, sponging miRNA to regulate downstream target gene, and participation in NF-κB, TGF-β, ERK/MAPK, and PI3K/AKT signaling pathways. In this review, we summarize the molecular mechanisms of MIR503HG in disease and its potential applications in diagnosis, prognosis, and treatment. We also raise some unanswered questions in this area, providing insights for future research.
Collapse
Affiliation(s)
- Xue Han
- Department of Obstetrics and Gynecology, Shengjing Hospital of China Medical University, No.36 Sanhao Street, Shenyang, Liaoning Province, China.
| | - Bo Li
- Department of Obstetrics and Gynecology, Shengjing Hospital of China Medical University, No.36 Sanhao Street, Shenyang, Liaoning Province, China. libo--
| | - Shitai Zhang
- Department of Obstetrics and Gynecology, Shengjing Hospital of China Medical University, No.36 Sanhao Street, Shenyang, Liaoning Province, China.
| |
Collapse
|
10
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
11
|
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep 2022; 41:111808. [PMID: 36543139 PMCID: PMC10073203 DOI: 10.1016/j.celrep.2022.111808] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 06/21/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022] Open
Abstract
Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari, Greece.
| | - Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Kate M Duggan
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
| |
Collapse
|
12
|
Parikh SB, Houghton C, Van Oss SB, Wacholder A, Carvunis A. Origins, evolution, and physiological implications of de novo genes in yeast. Yeast 2022; 39:471-481. [PMID: 35959631 PMCID: PMC9544372 DOI: 10.1002/yea.3810] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/03/2022] Open
Abstract
De novo gene birth is the process by which new genes emerge in sequences that were previously noncoding. Over the past decade, researchers have taken advantage of the power of yeast as a model and a tool to study the evolutionary mechanisms and physiological implications of de novo gene birth. We summarize the mechanisms that have been proposed to explicate how noncoding sequences can become protein-coding genes, highlighting the discovery of pervasive translation of the yeast transcriptome and its presumed impact on evolutionary innovation. We summarize current best practices for the identification and characterization of de novo genes. Crucially, we explain that the field is still in its nascency, with the physiological roles of most young yeast de novo genes identified thus far still utterly unknown. We hope this review inspires researchers to investigate the true contribution of de novo gene birth to cellular physiology and phenotypic diversity across yeast strains and species.
Collapse
Affiliation(s)
- Saurin B. Parikh
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - S. Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Anne‐Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| |
Collapse
|
13
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
14
|
Kosinski LJ, Aviles NR, Gomez K, Masel J. Random peptides rich in small and disorder-promoting amino acids are less likely to be harmful. Genome Biol Evol 2022; 14:evac085. [PMID: 35668555 PMCID: PMC9210321 DOI: 10.1093/gbe/evac085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/01/2022] [Accepted: 05/27/2022] [Indexed: 11/15/2022] Open
Abstract
Proteins are the workhorses of the cell, yet they carry great potential for harm via misfolding and aggregation. Despite the dangers, proteins are sometimes born de novo from non-coding DNA. Proteins are more likely to be born from non-coding regions that produce peptides that do little to no harm when translated than from regions that produce harmful peptides. To investigate which newborn proteins are most likely to "first, do no harm", we estimate fitnesses from an experiment that competed Escherichia coli lineages that each expressed a unique random peptide. A variety of peptide metrics significantly predict lineage fitness, but this predictive power stems from simple amino acid frequencies rather than the ordering of amino acids. Amino acids that are smaller and that promote intrinsic structural disorder have more benign fitness effects. We validate that the amino acids that indicate benign effects in random peptides expressed in E. coli also do so in an independent dataset of random N-terminal tags in which it is possible to control for expression level. The same amino acids are also enriched in young animal proteins.
Collapse
Affiliation(s)
- Luke J Kosinski
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, USA
| | - Nathan R Aviles
- Graduate Interdisciplinary Program in Statistics, University of Arizona, Tucson, USA
| | - Kevin Gomez
- Graduate Interdisciplinary Program in Applied Math, University of Arizona, Tucson, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, USA
| |
Collapse
|
15
|
Gilbert A, Saveanu C. Unusual SMG suspects recruit degradation enzymes in nonsense-mediated mRNA decay. Bioessays 2022; 44:e2100296. [PMID: 35266563 DOI: 10.1002/bies.202100296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 02/27/2022] [Accepted: 03/02/2022] [Indexed: 11/09/2022]
Abstract
Degradation of eukaryotic RNAs that contain premature termination codons (PTC) during nonsense-mediated mRNA decay (NMD) is initiated by RNA decapping or endonucleolytic cleavage driven by conserved factors. Models for NMD mechanisms, including recognition of PTCs or the timing and role of protein phosphorylation for RNA degradation are challenged by new results. For example, the depletion of the SMG5/7 heterodimer, thought to activate RNA degradation by decapping, leads to a phenotype showing a defect of endonucleolytic activity of NMD complexes. This phenotype is not correlated to a decreased binding of the endonuclease SMG6 with the core NMD factor UPF1, suggesting that it is the result of an imbalance between active (e.g., in polysomes) and inactive (e.g., in RNA-protein condensates) states of NMD complexes. Such imbalance between multiple complexes is not restricted to NMD and should be taken into account when establishing causal links between gene function perturbation and observed phenotypes.
Collapse
Affiliation(s)
- Agathe Gilbert
- Institut Pasteur, Sorbonne Université, CNRS UMR-3525, Paris, F-75015, France
| | - Cosmin Saveanu
- Institut Pasteur, Sorbonne Université, CNRS UMR-3525, Paris, F-75015, France
| |
Collapse
|
16
|
Kute PM, Soukarieh O, Tjeldnes H, Trégouët DA, Valen E. Small Open Reading Frames, How to Find Them and Determine Their Function. Front Genet 2022; 12:796060. [PMID: 35154250 PMCID: PMC8831751 DOI: 10.3389/fgene.2021.796060] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/30/2021] [Indexed: 12/12/2022] Open
Abstract
Advances in genomics and molecular biology have revealed an abundance of small open reading frames (sORFs) across all types of transcripts. While these sORFs are often assumed to be non-functional, many have been implicated in physiological functions and a significant number of sORFs have been described in human diseases. Thus, sORFs may represent a hidden repository of functional elements that could serve as therapeutic targets. Unlike protein-coding genes, it is not necessarily the encoded peptide of an sORF that enacts its function, sometimes simply the act of translating an sORF might have a regulatory role. Indeed, the most studied sORFs are located in the 5′UTRs of coding transcripts and can have a regulatory impact on the translation of the downstream protein-coding sequence. However, sORFs have also been abundantly identified in non-coding RNAs including lncRNAs, circular RNAs and ribosomal RNAs suggesting that sORFs may be diverse in function. Of the many different experimental methods used to discover sORFs, the most commonly used are ribosome profiling and mass spectrometry. These can confirm interactions between transcripts and ribosomes and the production of a peptide, respectively. Extensions to ribosome profiling, which also capture scanning ribosomes, have further made it possible to see how sORFs impact the translation initiation of mRNAs. While high-throughput techniques have made the identification of sORFs less difficult, defining their function, if any, is typically more challenging. Together, the abundance and potential function of many of these sORFs argues for the necessity of including sORFs in gene annotations and systematically characterizing these to understand their potential functional roles. In this review, we will focus on the high-throughput methods used in the detection and characterization of sORFs and discuss techniques for validation and functional characterization.
Collapse
Affiliation(s)
- Preeti Madhav Kute
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
| | - Omar Soukarieh
- Department of Molecular Epidemiology Of Vascular and Brain Disorders, INSERM, BPH, U1219, University of Bordeaux, Bordeaux, France
| | - Håkon Tjeldnes
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - David-Alexandre Trégouët
- Department of Molecular Epidemiology Of Vascular and Brain Disorders, INSERM, BPH, U1219, University of Bordeaux, Bordeaux, France
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- *Correspondence: Eivind Valen,
| |
Collapse
|
17
|
Cherezov RO, Vorontsova JE, Simonova OB. The Phenomenon of Evolutionary “De Novo Generation” of Genes. Russ J Dev Biol 2021. [DOI: 10.1134/s1062360421060035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
18
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
19
|
Andjus S, Morillon A, Wery M. From Yeast to Mammals, the Nonsense-Mediated mRNA Decay as a Master Regulator of Long Non-Coding RNAs Functional Trajectory. Noncoding RNA 2021; 7:ncrna7030044. [PMID: 34449682 PMCID: PMC8395947 DOI: 10.3390/ncrna7030044] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/22/2021] [Accepted: 07/25/2021] [Indexed: 12/22/2022] Open
Abstract
The Nonsense-Mediated mRNA Decay (NMD) has been classically viewed as a translation-dependent RNA surveillance pathway degrading aberrant mRNAs containing premature stop codons. However, it is now clear that mRNA quality control represents only one face of the multiple functions of NMD. Indeed, NMD also regulates the physiological expression of normal mRNAs, and more surprisingly, of long non-coding (lnc)RNAs. Here, we review the different mechanisms of NMD activation in yeast and mammals, and we discuss the molecular bases of the NMD sensitivity of lncRNAs, considering the functional roles of NMD and of translation in the metabolism of these transcripts. In this regard, we describe several examples of functional micropeptides produced from lncRNAs. We propose that translation and NMD provide potent means to regulate the expression of lncRNAs, which might be critical for the cell to respond to environmental changes.
Collapse
Affiliation(s)
- Sara Andjus
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL University, Sorbonne Université, CNRS UMR3244, 26 Rue d’Ulm, CEDEX 05, F-75248 Paris, France;
| | - Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, 26 Rue d’Ulm, CEDEX 05, F-75248 Paris, France
- Correspondence: (A.M.); (M.W.)
| | - Maxime Wery
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, 26 Rue d’Ulm, CEDEX 05, F-75248 Paris, France
- Correspondence: (A.M.); (M.W.)
| |
Collapse
|
20
|
Kosinski LJ, Masel J. Readthrough Errors Purge Deleterious Cryptic Sequences, Facilitating the Birth of Coding Sequences. Mol Biol Evol 2021; 37:1761-1774. [PMID: 32101291 DOI: 10.1093/molbev/msaa046] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
De novo protein-coding innovations sometimes emerge from ancestrally noncoding DNA, despite the expectation that translating random sequences is overwhelmingly likely to be deleterious. The "preadapting selection" hypothesis claims that emergence is facilitated by prior, low-level translation of noncoding sequences via molecular errors. It predicts that selection on polypeptides translated only in error is strong enough to matter and is strongest when erroneous expression is high. To test this hypothesis, we examined noncoding sequences located downstream of stop codons (i.e., those potentially translated by readthrough errors) in Saccharomyces cerevisiae genes. We identified a class of "fragile" proteins under strong selection to reduce readthrough, which are unlikely substrates for co-option. Among the remainder, sequences showing evidence of readthrough translation, as assessed by ribosome profiling, encoded C-terminal extensions with higher intrinsic structural disorder, supporting the preadapting selection hypothesis. The cryptic sequences beyond the stop codon, rather than spillover effects from the regular C-termini, are primarily responsible for the higher disorder. Results are robust to controlling for the fact that stronger selection also reduces the length of C-terminal extensions. These findings indicate that selection acts on 3' UTRs in Saccharomyces cerevisiae to purge potentially deleterious variants of cryptic polypeptides, acting more strongly in genes that experience more readthrough errors.
Collapse
Affiliation(s)
- Luke J Kosinski
- Molecular and Cellular Biology, University of Arizona, Tucson, AZ
| | - Joanna Masel
- Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ
| |
Collapse
|
21
|
Majic P, Payne JL. Enhancers Facilitate the Birth of De Novo Genes and Gene Integration into Regulatory Networks. Mol Biol Evol 2021; 37:1165-1178. [PMID: 31845961 PMCID: PMC7086177 DOI: 10.1093/molbev/msz300] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Regulatory networks control the spatiotemporal gene expression patterns that give rise to and define the individual cell types of multicellular organisms. In eumetazoa, distal regulatory elements called enhancers play a key role in determining the structure of such networks, particularly the wiring diagram of “who regulates whom.” Mutations that affect enhancer activity can therefore rewire regulatory networks, potentially causing adaptive changes in gene expression. Here, we use whole-tissue and single-cell transcriptomic and chromatin accessibility data from mouse to show that enhancers play an additional role in the evolution of regulatory networks: They facilitate network growth by creating transcriptionally active regions of open chromatin that are conducive to de novo gene evolution. Specifically, our comparative transcriptomic analysis with three other mammalian species shows that young, mouse-specific intergenic open reading frames are preferentially located near enhancers, whereas older open reading frames are not. Mouse-specific intergenic open reading frames that are proximal to enhancers are more highly and stably transcribed than those that are not proximal to enhancers or promoters, and they are transcribed in a limited diversity of cellular contexts. Furthermore, we report several instances of mouse-specific intergenic open reading frames proximal to promoters showing evidence of being repurposed enhancers. We also show that open reading frames gradually acquire interactions with enhancers over macroevolutionary timescales, helping integrate genes—those that have arisen de novo or by other means—into existing regulatory networks. Taken together, our results highlight a dual role of enhancers in expanding and rewiring gene regulatory networks.
Collapse
Affiliation(s)
- Paco Majic
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Corresponding author: E-mail:
| |
Collapse
|
22
|
Uncovering de novo gene birth in yeast using deep transcriptomics. Nat Commun 2021; 12:604. [PMID: 33504782 PMCID: PMC7841160 DOI: 10.1038/s41467-021-20911-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 01/04/2021] [Indexed: 01/30/2023] Open
Abstract
De novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes.
Collapse
|
23
|
Knopp M, Babina AM, Gudmundsdóttir JS, Douglass MV, Trent MS, Andersson DI. A novel type of colistin resistance genes selected from random sequence space. PLoS Genet 2021; 17:e1009227. [PMID: 33411736 PMCID: PMC7790251 DOI: 10.1371/journal.pgen.1009227] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 10/27/2020] [Indexed: 11/29/2022] Open
Abstract
Antibiotic resistance is a rapidly increasing medical problem that severely limits the success of antibiotic treatments, and the identification of resistance determinants is key for surveillance and control of resistance dissemination. Horizontal transfer is the dominant mechanism for spread of resistance genes between bacteria but little is known about the original emergence of resistance genes. Here, we examined experimentally if random sequences can generate novel antibiotic resistance determinants de novo. By utilizing highly diverse expression libraries encoding random sequences to select for open reading frames that confer resistance to the last-resort antibiotic colistin in Escherichia coli, six de novocolistin resistance conferring peptides (Dcr) were identified. The peptides act via direct interactions with the sensor kinase PmrB (also termed BasS in E. coli), causing an activation of the PmrAB two-component system (TCS), modification of the lipid A domain of lipopolysaccharide and subsequent colistin resistance. This kinase-activation was extended to other TCS by generation of chimeric sensor kinases. Our results demonstrate that peptides with novel activities mediated via specific peptide-protein interactions in the transmembrane domain of a sensory transducer can be selected de novo, suggesting that the origination of such peptides from non-coding regions is conceivable. In addition, we identified a novel class of resistance determinants for a key antibiotic that is used as a last resort treatment for several significant pathogens. The high-level resistance provided at low expression levels, absence of significant growth defects and the functionality of Dcr peptides across different genera suggest that this class of peptides could potentially evolve as bona fide resistance determinants in natura. We expressed over 100 million randomly generated DNA sequences in Escherichia coli and selected 6 variants that encode peptides that provide resistance to the last-resort antibiotic colistin. We show that the selected peptides are auxiliary activators of the two-component system PmrAB, and that resistance is mediated via modifications of the cell envelope causing decreased antibiotic uptake. This is the first example where random expression libraries have been employed to select for peptides that perform an activating function by direct peptide-protein interactions in vivo, adding support to the idea that non-coding DNA can serve as a substrate for de novo gene evolution. Additionally, the described peptides expand the narrow list of colistin resistance genes and further analyses of clinical isolates will be necessary to determine if similar resistance determinants have evolved in natura.
Collapse
Affiliation(s)
- Michael Knopp
- Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- * E-mail: (MK); (DIA)
| | - Arianne M. Babina
- Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
| | | | - Martin V. Douglass
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Georgia, United States of America
| | - M. Stephen Trent
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Georgia, United States of America
- Department of Microbiology, Franklin College of Arts and Sciences, University of Georgia, Georgia, United States of America
| | - Dan I. Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden
- * E-mail: (MK); (DIA)
| |
Collapse
|
24
|
Cai B, Li Z, Ma M, Zhang J, Kong S, Abdalla BA, Xu H, Jebessa E, Zhang X, Lawal RA, Nie Q. Long noncoding RNA SMUL suppresses SMURF2 production-mediated muscle atrophy via nonsense-mediated mRNA decay. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 23:512-526. [PMID: 33510940 PMCID: PMC7807096 DOI: 10.1016/j.omtn.2020.12.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 12/06/2020] [Indexed: 12/13/2022]
Abstract
As the world population grows, muscle atrophy leading to muscle wasting could become a bigger risk. Long noncoding RNAs (lncRNAs) are known to play important roles in muscle growth and muscle atrophy. Meanwhile, it has recently come to light that many putative small open reading frames (sORFs) are hidden in lncRNAs; however, their translational capabilities and functions remain unclear. In this study, we uncovered 104 myogenic-associated lncRNAs translated, in at least a small peptide, by integrated transcriptome and proteomic analyses. Furthermore, an upstream ORF (uORF) regulatory network was constructed, and a novel muscle atrophy-associated lncRNA named SMUL (Smad ubiquitin regulatory factor 2 [SMURF2] upstream lncRNA) was identified. SMUL was highly expressed in skeletal muscle, and its expression level was downregulated during myoblast differentiation. SMUL promoted myoblast proliferation and suppressed differentiation in vitro. In vivo, SMUL induced skeletal muscle atrophy and promoted a switch from slow-twitch to fast-twitch fibers. In the meantime, translation of the SMUL sORF disrupted the stability of SMURF2 mRNA. Mechanistically, SMUL restrained SMURF2 production via nonsense-mediated mRNA decay (NMD), participating in the regulation of the transforming growth factor β (TGF-β)/SMAD pathway and further regulating myogenesis and muscle atrophy. Taken together, these results suggest that SMUL could be a novel therapeutic target for muscle atrophy.
Collapse
Affiliation(s)
- Bolin Cai
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Zhenhui Li
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Laboratory of Neurobiology and Behavior, The Rockefeller University, New York, NY 10065, USA.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Manting Ma
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Jing Zhang
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Shaofen Kong
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Bahareldin Ali Abdalla
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Haiping Xu
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Endashaw Jebessa
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | - Xiquan Zhang
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| | | | - Qinghua Nie
- College of Animal Science, Lingnan Guangdong Laboratory of Modern Agriculture & State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China
| |
Collapse
|
25
|
Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage. Genome Biol Evol 2020; 12:2183-2195. [PMID: 33210146 PMCID: PMC7674706 DOI: 10.1093/gbe/evaa194] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
In addition to known genes, much of the human genome is transcribed into RNA. Chance formation of novel open reading frames (ORFs) can lead to the translation of myriad new proteins. Some of these ORFs may yield advantageous adaptive de novo proteins. However, widespread translation of noncoding DNA can also produce hazardous protein molecules, which can misfold and/or form toxic aggregates. The dynamics of how de novo proteins emerge from potentially toxic raw materials and what influences their long-term survival are unknown. Here, using transcriptomic data from human and five other primates, we generate a set of transcribed human ORFs at six conservation levels to investigate which properties influence the early emergence and long-term retention of these expressed ORFs. As these taxa diverged from each other relatively recently, we present a fine scale view of the evolution of novel sequences over recent evolutionary time. We find that novel human-restricted ORFs are preferentially located on GC-rich gene-dense chromosomes, suggesting their retention is linked to pre-existing genes. Sequence properties such as intrinsic structural disorder and aggregation propensity-which have been proposed to play a role in survival of de novo genes-remain unchanged over time. Even very young sequences code for proteins with low aggregation propensities, suggesting that genomic regions with many novel transcribed ORFs are concomitantly less likely to produce ORFs which code for harmful toxic proteins. Our data indicate that the survival of these novel ORFs is largely stochastic rather than shaped by selection.
Collapse
Affiliation(s)
- Daniel Dowling
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | - Jonathan F Schmitz
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | | |
Collapse
|
26
|
Zile K, Dessimoz C, Wurm Y, Masel J. Only a Single Taxonomically Restricted Gene Family in the Drosophila melanogaster Subgroup Can Be Identified with High Confidence. Genome Biol Evol 2020; 12:1355-1366. [PMID: 32589737 PMCID: PMC8059200 DOI: 10.1093/gbe/evaa127] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2020] [Indexed: 12/12/2022] Open
Abstract
Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia. It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.
Collapse
Affiliation(s)
- Karina Zile
- Division of Biosciences, University College London, United Kingdom
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Switzerland
- Department of Genetics, Evolution and Environment, University College London, United Kingdom
- Department of Computer Science, University College London, United Kingdom
| | - Yannick Wurm
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona
| |
Collapse
|
27
|
Evolution of novel genes in three-spined stickleback populations. Heredity (Edinb) 2020; 125:50-59. [PMID: 32499660 PMCID: PMC7413265 DOI: 10.1038/s41437-020-0319-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 04/27/2020] [Accepted: 04/30/2020] [Indexed: 12/22/2022] Open
Abstract
Eukaryotic genomes frequently acquire new protein-coding genes which may significantly impact an organism’s fitness. Novel genes can be created, for example, by duplication of large genomic regions or de novo, from previously non-coding DNA. Either way, creation of a novel transcript is an essential early step during novel gene emergence. Most studies on the gain-and-loss dynamics of novel genes so far have compared genomes between species, constraining analyses to genes that have remained fixed over long time scales. However, the importance of novel genes for rapid adaptation among populations has recently been shown. Therefore, since little is known about the evolutionary dynamics of transcripts across natural populations, we here study transcriptomes from several tissues and nine geographically distinct populations of an ecological model species, the three-spined stickleback. Our findings suggest that novel genes typically start out as transcripts with low expression and high tissue specificity. Early expression regulation appears to be mediated by gene-body methylation. Although most new and narrowly expressed genes are rapidly lost, those that survive and subsequently spread through populations tend to gain broader and higher expression levels. The properties of the encoded proteins, such as disorder and aggregation propensity, hardly change. Correspondingly, young novel genes are not preferentially under positive selection but older novel genes more often overlap with FST outlier regions. Taken together, expression of the surviving novel genes is rapidly regulated, probably via epigenetic mechanisms, while structural properties of encoded proteins are non-debilitating and might only change much later.
Collapse
|
28
|
Hallin J, Cisneros AF, Hénault M, Fijarczyk A, Dandage R, Bautista C, Landry CR. Similarities in biological processes can be used to bridge ecology and molecular biology. Evol Appl 2020; 13:1335-1350. [PMID: 32684962 PMCID: PMC7359829 DOI: 10.1111/eva.12961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 02/17/2020] [Accepted: 03/16/2020] [Indexed: 01/10/2023] Open
Abstract
Much of the research in biology aims to understand the origin of diversity. Naturally, ecological diversity was the first object of study, but we now have the necessary tools to probe diversity at molecular scales. The inherent differences in how we study diversity at different scales caused the disciplines of biology to be organized around these levels, from molecular biology to ecology. Here, we illustrate that there are key properties of each scale that emerge from the interactions of simpler components and that these properties are often shared across different levels of organization. This means that ideas from one level of organization can be an inspiration for novel hypotheses to study phenomena at another level. We illustrate this concept with examples of events at the molecular level that have analogs at the organismal or ecological level and vice versa. Through these examples, we illustrate that biological processes at different organization levels are governed by general rules. The study of the same phenomena at different scales could enrich our work through a multidisciplinary approach, which should be a staple in the training of future scientists.
Collapse
Affiliation(s)
- Johan Hallin
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Angel F Cisneros
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Mathieu Hénault
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Anna Fijarczyk
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Rohan Dandage
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Carla Bautista
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Christian R Landry
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| |
Collapse
|
29
|
Heames B, Schmitz J, Bornberg-Bauer E. A Continuum of Evolving De Novo Genes Drives Protein-Coding Novelty in Drosophila. J Mol Evol 2020; 88:382-398. [PMID: 32253450 PMCID: PMC7162840 DOI: 10.1007/s00239-020-09939-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 03/13/2020] [Indexed: 12/13/2022]
Abstract
Orphan genes, lacking detectable homologs in outgroup species, typically represent 10-30% of eukaryotic genomes. Efforts to find the source of these young genes indicate that de novo emergence from non-coding DNA may in part explain their prevalence. Here, we investigate the roots of orphan gene emergence in the Drosophila genus. Across the annotated proteomes of twelve species, we find 6297 orphan genes within 4953 taxon-specific clusters of orthologs. By inferring the ancestral DNA as non-coding for between 550 and 2467 (8.7-39.2%) of these genes, we describe for the first time how de novo emergence contributes to the abundance of clade-specific Drosophila genes. In support of them having functional roles, we show that de novo genes have robust expression and translational support. However, the distinct nucleotide sequences of de novo genes, which have characteristics intermediate between intergenic regions and conserved genes, reflect their recent birth from non-coding DNA. We find that de novo genes encode more disordered proteins than both older genes and intergenic regions. Together, our results suggest that gene emergence from non-coding DNA provides an abundant source of material for the evolution of new proteins. Following gene birth, gradual evolution over large evolutionary timescales moulds sequence properties towards those of conserved genes, resulting in a continuum of properties whose starting points depend on the nucleotide sequences of an initial pool of novel genes.
Collapse
Affiliation(s)
- Brennen Heames
- Institute for Evolution and Biodiversity, 48149, Münster, Germany
| | - Jonathan Schmitz
- Institute for Evolution and Biodiversity, 48149, Münster, Germany
| | | |
Collapse
|
30
|
Abrahams L, Hurst LD. A Depletion of Stop Codons in lincRNA is Owing to Transfer of Selective Constraint from Coding Sequences. Mol Biol Evol 2020; 37:1148-1164. [PMID: 31841162 PMCID: PMC7086181 DOI: 10.1093/molbev/msz299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.
Collapse
Affiliation(s)
- Liam Abrahams
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
31
|
Rödelsperger C, Prabh N, Sommer RJ. New Gene Origin and Deep Taxon Phylogenomics: Opportunities and Challenges. Trends Genet 2019; 35:914-922. [DOI: 10.1016/j.tig.2019.08.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 08/07/2019] [Accepted: 08/29/2019] [Indexed: 01/22/2023]
|
32
|
Li M, Fine RD, Dinda M, Bekiranov S, Smith JS. A Sir2-regulated locus control region in the recombination enhancer of Saccharomyces cerevisiae specifies chromosome III structure. PLoS Genet 2019; 15:e1008339. [PMID: 31461456 PMCID: PMC6736312 DOI: 10.1371/journal.pgen.1008339] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 09/10/2019] [Accepted: 08/01/2019] [Indexed: 11/18/2022] Open
Abstract
The NAD+-dependent histone deacetylase Sir2 was originally identified in Saccharomyces cerevisiae as a silencing factor for HML and HMR, the heterochromatic cassettes utilized as donor templates during mating-type switching. MATa cells preferentially switch to MATα using HML as the donor, which is driven by an adjacent cis-acting element called the recombination enhancer (RE). In this study we demonstrate that Sir2 and the condensin complex are recruited to the RE exclusively in MATa cells, specifically to the promoter of a small gene within the right half of the RE known as RDT1. We also provide evidence that the RDT1 promoter functions as a locus control region (LCR) that regulates both transcription and long-range chromatin interactions. Sir2 represses RDT1 transcription until it is removed from the promoter in response to a dsDNA break at the MAT locus induced by HO endonuclease during mating-type switching. Condensin is also recruited to the RDT1 promoter and is displaced upon HO induction, but does not significantly repress RDT1 transcription. Instead condensin appears to promote mating-type donor preference by maintaining proper chromosome III architecture, which is defined by the interaction of HML with the right arm of chromosome III, including MATa and HMR. Remarkably, eliminating Sir2 and condensin recruitment to the RDT1 promoter disrupts this structure and reveals an aberrant interaction between MATa and HMR, consistent with the partially defective donor preference for this mutant. Global condensin subunit depletion also impairs mating-type switching efficiency and donor preference, suggesting that modulation of chromosome architecture plays a significant role in controlling mating-type switching, thus providing a novel model for dissecting condensin function in vivo.
Collapse
Affiliation(s)
- Mingguang Li
- Department of Laboratory Medicine, Jilin Medical University, Jilin, China
| | - Ryan D Fine
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia, United States of America
| | - Manikarna Dinda
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia, United States of America
| | - Stefan Bekiranov
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia, United States of America
| | - Jeffrey S Smith
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia, United States of America
| |
Collapse
|
33
|
Nielly-Thibault L, Landry CR. Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases. Genetics 2019; 212:1353-1366. [PMID: 31227545 PMCID: PMC6707459 DOI: 10.1534/genetics.119.302187] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/14/2019] [Indexed: 12/03/2022] Open
Abstract
Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.
Collapse
Affiliation(s)
- Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| |
Collapse
|
34
|
Pinson MR, Miranda RC. Noncoding RNAs in development and teratology, with focus on effects of cannabis, cocaine, nicotine, and ethanol. Birth Defects Res 2019; 111:1308-1319. [PMID: 31356004 DOI: 10.1002/bdr2.1559] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 07/11/2019] [Accepted: 07/12/2019] [Indexed: 02/06/2023]
Abstract
Completion of the Human Genome Project has led to the identification of a large number of transcription start sites that are not paired with protein-coding genes, supporting the growing recognition of the abundance of encoded nonprotein-coding RNAs (ncRNAs) and their importance for speciation and species-specific development. Present in both plants and animals, ncRNAs vary in size, function, primary sequence, and secondary structure. While microRNAs (miRNAs) are the best known, there are a number of other ncRNAs (long[er] nonprotein-coding RNA, pseudogenes, circular RNAs, and so on) that have been shown to play an important role in the development either directly or via networks of proteins and other ncRNAs, including modulating the impact of miRNAs. Furthermore, these ncRNAs and their developmental regulatory networks are sensitive to teratogens such as ethanol, cannabis, cocaine, and nicotine. A better understanding of the developmental role of ncRNAs and their capacity to mediate teratogenesis is a necessary step in efforts to minimize the long-term consequences of developmental exposures to drugs-of-abuse. Moreover, with increasing awareness of the prevalence of polydrug use, experimental models will need to incorporate more complex drug exposure paradigms into meaningful assessments of developmental ncRNA function.
Collapse
Affiliation(s)
- Marisa R Pinson
- Department of Neuroscience and Experimental Therapeutics, Texas A&M Health Science Center, 8447 Riverside Pkwy Suite 1005 MREB, Bryan, Texas
| | - Rajesh C Miranda
- Department of Neuroscience and Experimental Therapeutics, Texas A&M Health Science Center, 8447 Riverside Pkwy Suite 1005 MREB, Bryan, Texas
| |
Collapse
|
35
|
Ruiz-Orera J, Albà MM. Conserved regions in long non-coding RNAs contain abundant translation and protein-RNA interaction signatures. NAR Genom Bioinform 2019; 1:e2. [PMID: 33575549 PMCID: PMC7671363 DOI: 10.1093/nargab/lqz002] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 06/14/2019] [Accepted: 07/04/2019] [Indexed: 02/06/2023] Open
Abstract
The mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes and that are known as long non-coding RNAs (lncRNAs). A handful of lncRNAs have well-characterized regulatory functions but the biological significance of the majority of them is not well understood. LncRNAs that are conserved between mice and humans are likely to be enriched in functional sequences. Here, we investigate the presence of different types of ribosome profiling signatures in lncRNAs and how they relate to sequence conservation. We find that lncRNA-conserved regions contain three times more ORFs with translation evidence than non-conserved ones, and identify nine cases that display significant sequence constraints at the amino acid sequence level. The study also reveals that conserved regions in intergenic lncRNAs are significantly enriched in protein–RNA interaction signatures when compared to non-conserved ones; this includes sites in well-characterized lncRNAs, such as Cyrano, Malat1, Neat1 and Meg3, as well as in tens of lncRNAs of unknown function. This work illustrates how the analysis of ribosome profiling data coupled with evolutionary analysis provides new opportunities to explore the lncRNA functional landscape.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain
| | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain.,Catalan Institution for Research and Advanced Studies, Passeig Lluís Companys 23, Barcelona 08010, Spain
| |
Collapse
|
36
|
Abstract
The origin of novel genes and beneficial functions is of fundamental interest in evolutionary biology. New genes can originate from different mechanisms, including horizontal gene transfer, duplication-divergence, and de novo from noncoding DNA sequences. Comparative genomics has generated strong evidence for de novo emergence of genes in various organisms, but experimental demonstration of this process has been limited to localized randomization in preexisting structural scaffolds. This bypasses the basic requirement of de novo gene emergence, i.e., lack of an ancestral gene. We constructed highly diverse plasmid libraries encoding randomly generated open reading frames and expressed them in Escherichia coli to identify short peptides that could confer a beneficial and selectable phenotype in vivo (in a living cell). Selections on antibiotic-containing agar plates resulted in the identification of three peptides that increased aminoglycoside resistance up to 48-fold. Combining genetic and functional analyses, we show that the peptides are highly hydrophobic, and by inserting into the membrane, they reduce membrane potential, decrease aminoglycoside uptake, and thereby confer high-level resistance. This study demonstrates that randomized DNA sequences can encode peptides that confer selective benefits and illustrates how expression of random sequences could spark the origination of new genes. In addition, our results also show that this question can be addressed experimentally by expression of highly diverse sequence libraries and subsequent selection for specific functions, such as resistance to toxic compounds, the ability to rescue auxotrophic/temperature-sensitive mutants, and growth on normally nonused carbon sources, allowing the exploration of many different phenotypes.IMPORTANCE De novo gene origination from nonfunctional DNA sequences was long assumed to be implausible. However, recent studies have shown that large fractions of genomic noncoding DNA are transcribed and translated, potentially generating new genes. Experimental validation of this process so far has been limited to comparative genomics, in vitro selections, or partial randomizations. Here, we describe selection of novel peptides in vivo using fully random synthetic expression libraries. The peptides confer aminoglycoside resistance by inserting into the bacterial membrane and thereby partly reducing membrane potential and decreasing drug uptake. Our results show that beneficial peptides can be selected from random sequence pools in vivo and support the idea that expression of noncoding sequences could spark the origination of new genes.
Collapse
|
37
|
Durand É, Gagnon-Arsenault I, Hallin J, Hatin I, Dubé AK, Nielly-Thibault L, Namy O, Landry CR. Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations. Genome Res 2019; 29:932-943. [PMID: 31152050 PMCID: PMC6581059 DOI: 10.1101/gr.239822.118] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 05/13/2019] [Indexed: 12/17/2022]
Abstract
Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of intergenic ORFs show translation signatures similar to canonical genes, and we experimentally confirmed the translation of many of these ORFs in laboratory conditions using a reporter assay. Compared with canonical genes, intergenic ORFs have lower translation efficiency, which could imply a lack of optimization for translation or a mechanism to reduce their production cost. Translated intergenic ORFs also tend to have sequence properties that are generally close to those of random intergenic sequences. However, some of the very recent translated intergenic ORFs, which appeared <110 kya, already show gene-like characteristics, suggesting that the raw material for functional innovations could appear over short evolutionary timescales.
Collapse
Affiliation(s)
- Éléonore Durand
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Johan Hallin
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Hatin
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Olivier Namy
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| |
Collapse
|
38
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
39
|
Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I. A Molecular Portrait of De Novo Genes in Yeasts. Mol Biol Evol 2019; 35:631-645. [PMID: 29220506 DOI: 10.1093/molbev/msx315] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Alex S Hebert
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | - Dana A Opulente
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Guillaume Achaz
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,SMILE Group, CIRB UMR7241, Collège de France, Paris, France
| | - Chris Todd Hittinger
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Gilles Fischer
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Joshua J Coon
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI.,Department of Chemistry, University of Wisconsin-Madison, Madison, WI.,Morgridge Institute for Research, Madison, WI
| | - Ingrid Lafontaine
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Physico-Chimique, Physiologie Membranaire et Moléculaire du Chloroplaste UMR7141, 75005 Paris, France
| |
Collapse
|
40
|
Translation of Small Open Reading Frames: Roles in Regulation and Evolutionary Innovation. Trends Genet 2018; 35:186-198. [PMID: 30606460 DOI: 10.1016/j.tig.2018.12.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023]
Abstract
The translatome can be defined as the sum of the RNA sequences that are translated into proteins in the cell by the ribosomal machinery. Until recently, it was generally assumed that the translatome was essentially restricted to evolutionary conserved proteins encoded by the set of annotated protein-coding genes. However, it has become increasingly clear that it also includes small regulatory open reading frames (ORFs), functional micropeptides, de novo proteins, and the pervasive translation of likely nonfunctional proteins. Many of these ORFs have been discovered thanks to the development of ribosome profiling, a technique to sequence ribosome-protected RNA fragments. To fully capture the diversity of translated ORFs, we propose a comprehensive classification that includes the new types of translated ORFs in addition to standard proteins.
Collapse
|
41
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
42
|
Kern C, Wang Y, Chitwood J, Korf I, Delany M, Cheng H, Medrano JF, Van Eenennaam AL, Ernst C, Ross P, Zhou H. Genome-wide identification of tissue-specific long non-coding RNA in three farm animal species. BMC Genomics 2018; 19:684. [PMID: 30227846 PMCID: PMC6145346 DOI: 10.1186/s12864-018-5037-7] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 08/27/2018] [Indexed: 03/08/2023] Open
Abstract
Background Numerous long non-coding RNAs (lncRNAs) have been identified and their roles in gene regulation in humans, mice, and other model organisms studied; however, far less research has been focused on lncRNAs in farm animal species. While previous studies in chickens, cattle, and pigs identified lncRNAs in specific developmental stages or differentially expressed under specific conditions in a limited number of tissues, more comprehensive identification of lncRNAs in these species is needed. The goal of the FAANG Consortium (Functional Annotation of Animal Genomes) is to functionally annotate animal genomes, including the annotation of lncRNAs. As one of the FAANG pilot projects, lncRNAs were identified across eight tissues in two adult male biological replicates from chickens, cattle, and pigs. Results Comprehensive lncRNA annotations for the chicken, cattle, and pig genomes were generated by utilizing RNA-seq from eight tissue types from two biological replicates per species at the adult developmental stage. A total of 9393 lncRNAs in chickens, 7235 lncRNAs in cattle, and 14,429 lncRNAs in pigs were identified. Including novel isoforms and lncRNAs from novel loci, 5288 novel lncRNAs were identified in chickens, 3732 in cattle, and 4870 in pigs. These transcripts match previously known patterns of lncRNAs, such as generally lower expression levels than mRNAs and higher tissue specificity. An analysis of lncRNA conservation across species identified a set of conserved lncRNAs with potential functions associated with chromatin structure and gene regulation. Tissue-specific lncRNAs were identified. Genes proximal to tissue-specific lncRNAs were enriched for GO terms associated with the tissue of origin, such as leukocyte activation in spleen. Conclusions LncRNAs were identified in three important farm animal species using eight tissues from adult individuals. About half of the identified lncRNAs were not previously reported in the NCBI annotations for these species. While lncRNAs are less conserved than protein-coding genes, a set of positionally conserved lncRNAs were identified among chickens, cattle, and pigs with potential functions related to chromatin structure and gene regulation. Tissue-specific lncRNAs have potential regulatory functions on genes enriched for tissue-specific GO terms. Future work will include epigenetic data from ChIP-seq experiments to further refine these annotations. Electronic supplementary material The online version of this article (10.1186/s12864-018-5037-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Colin Kern
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Ying Wang
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - James Chitwood
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Ian Korf
- Genome Center, University of California, Davis, Davis, CA, USA
| | - Mary Delany
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Hans Cheng
- USDA-ARS, Avian Disease and Oncology Laboratory, East Lansing, MI, USA
| | - Juan F Medrano
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | | | - Catherine Ernst
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Pablo Ross
- Department of Animal Science, University of California, Davis, Davis, CA, USA.
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
43
|
Developmental Dynamics of Long Noncoding RNA Expression during Sexual Fruiting Body Formation in Fusarium graminearum. mBio 2018; 9:mBio.01292-18. [PMID: 30108170 PMCID: PMC6094484 DOI: 10.1128/mbio.01292-18] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Long noncoding RNA (lncRNA) plays important roles in sexual development in eukaryotes. In filamentous fungi, however, little is known about the expression and roles of lncRNAs during fruiting body formation. By profiling developmental transcriptomes during the life cycle of the plant-pathogenic fungus Fusarium graminearum, we identified 547 lncRNAs whose expression was highly dynamic, with about 40% peaking at the meiotic stage. Many lncRNAs were found to be antisense to mRNAs, forming 300 sense-antisense pairs. Although small RNAs were produced from these overlapping loci, antisense lncRNAs appeared not to be involved in gene silencing pathways. Genome-wide analysis of small RNA clusters identified many silenced loci at the meiotic stage. However, we found transcriptionally active small RNA clusters, many of which were associated with lncRNAs. Also, we observed that many antisense lncRNAs and their respective sense transcripts were induced in parallel as the fruiting bodies matured. The nonsense-mediated decay (NMD) pathway is known to determine the fates of lncRNAs as well as mRNAs. Thus, we analyzed mutants defective in NMD and identified a subset of lncRNAs that were induced during sexual development but suppressed by NMD during vegetative growth. These results highlight the developmental stage-specific nature and functional potential of lncRNA expression in shaping the fungal fruiting bodies and provide fundamental resources for studying sexual stage-induced lncRNAs. Fusarium graminearum is the causal agent of the head blight on our major staple crops, wheat and corn. The fruiting body formation on the host plants is indispensable for the disease cycle and epidemics. Long noncoding RNA (lncRNA) molecules are emerging as key regulatory components for sexual development in animals and plants. To date, however, there is a paucity of information on the roles of lncRNAs in fungal fruiting body formation. Here we characterized hundreds of lncRNAs that exhibited developmental stage-specific expression patterns during fruiting body formation. Also, we discovered that many lncRNAs were induced in parallel with their overlapping transcripts on the opposite DNA strand during sexual development. Finally, we found a subset of lncRNAs that were regulated by an RNA surveillance system during vegetative growth. This research provides fundamental genomic resources that will spur further investigations on lncRNAs that may play important roles in shaping fungal fruiting bodies.
Collapse
|
44
|
Bekpen C, Xie C, Tautz D. Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences. BMC Evol Biol 2018; 18:121. [PMID: 30075701 PMCID: PMC6091031 DOI: 10.1186/s12862-018-1232-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 07/16/2018] [Indexed: 12/26/2022] Open
Abstract
Background The adaptive immune system of vertebrates has an extraordinary potential to sense and neutralize foreign antigens entering the body. De novo evolution of genes implies that the genome itself expresses novel antigens from intergenic sequences which could cause a problem with this immune system. Peptides from these novel proteins could be presented by the major histocompatibility complex (MHC) receptors to the cell surface and would be recognized as foreign. The respective cells would then be attacked and destroyed, or would cause inflammatory responses. Hence, de novo expressed peptides have to be introduced to the immune system as being self-peptides to avoid such autoimmune reactions. The regulation of the distinction between self and non-self starts during embryonic development, but continues late into adulthood. It is mostly mediated by specialized cells in the thymus, but can also be conveyed in peripheral tissues, such as the lymph nodes and the spleen. The self-antigens need to be exposed to the reactive T-cells, which requires the expression of the genes in the respective tissues. Since the initial activation of a promotor for new intergenic transcription of a de novo gene could occur in any tissue, we should expect that the evolutionary establishment of a de novo gene in animals with an adaptive immune system should also involve expression in at least one of the tissues that confer self-recognition. Results We have studied this question by analyzing the transcriptomes of multiple tissues from young mice in three closely related natural populations of the house mouse (M. m. domesticus). We find that new intergenic transcription occurs indeed mostly in only a single tissue. When a second tissue becomes involved, thymus and spleen are significantly overrepresented. Conclusions We conclude that the inclusion of de novo transcripts in the processes for the induction of self-tolerance is indeed an important step in the evolution of functional de novo genes in vertebrates. Electronic supplementary material The online version of this article (10.1186/s12862-018-1232-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cemalettin Bekpen
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Chen Xie
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany.
| |
Collapse
|
45
|
Abstract
One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought.
Collapse
Affiliation(s)
- Warren R Francis
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gert Wörheide
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany.,GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany.,Bavarian State Collection for Paleontology and Geology, Munich, Germany
| |
Collapse
|
46
|
Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat Ecol Evol 2018; 2:890-896. [DOI: 10.1038/s41559-018-0506-6] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 02/16/2018] [Indexed: 01/29/2023]
|
47
|
Baalsrud HT, Tørresen OK, Solbakken MH, Salzburger W, Hanel R, Jakobsen KS, Jentoft S. De Novo Gene Evolution of Antifreeze Glycoproteins in Codfishes Revealed by Whole Genome Sequence Data. Mol Biol Evol 2018; 35:593-606. [PMID: 29216381 PMCID: PMC5850335 DOI: 10.1093/molbev/msx311] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
New genes can arise through duplication of a pre-existing gene or de novo from non-coding DNA, providing raw material for evolution of new functions in response to a changing environment. A prime example is the independent evolution of antifreeze glycoprotein genes (afgps) in the Arctic codfishes and Antarctic notothenioids to prevent freezing. However, the highly repetitive nature of these genes complicates studies of their organization. In notothenioids, afgps evolved from an extant gene, yet the evolutionary origin of afgps in codfishes is unknown. Here, we demonstrate that afgps in codfishes have evolved de novo from non-coding DNA 13-18 Ma, coinciding with the cooling of the Northern Hemisphere. Using whole-genome sequence data from several codfishes and notothenioids, we find higher copy number of afgp in species exposed to more severe freezing suggesting a gene dosage effect. Notably, antifreeze function is lost in one lineage of codfishes analogous to the afgp losses in non-Antarctic notothenioids. This indicates that selection can eliminate the antifreeze function when freezing is no longer imminent. In addition, we show that evolution of afgp-assisting antifreeze potentiating protein genes (afpps) in notothenioids coincides with origin and lineage-specific losses of afgp. The origin of afgps in codfishes is one of the first examples of an essential gene born from non-coding DNA in a non-model species. Our study underlines the power of comparative genomics to uncover past molecular signatures of genome evolution, and further highlights the impact of de novo gene origin in response to a changing selection regime.
Collapse
Affiliation(s)
- Helle Tessand Baalsrud
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Ole Kristian Tørresen
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Monica Hongrø Solbakken
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Walter Salzburger
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
- Zoological Institute, University of Basel, Basel, Switzerland
| | - Reinhold Hanel
- Institute of Fisheries Ecology, Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries, Hamburg, Germany
| | - Kjetill S Jakobsen
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Sissel Jentoft
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| |
Collapse
|
48
|
Abstract
Peptides encoded by short open reading frames (sORFs) are usually defined as peptides ≤100 aa long. Usually sORFs were ignored by automatic genome annotation programs due to the high probability of false discovery. However, improved computational tools along with a high-throughput RIBO-seq approach identified a myriad of translated sORFs. Their importance becomes evident as we are gaining experimental validation of their diverse cellular functions. This Review examines various computational and experimental approaches of sORFs identification as well as provides the summary of our current knowledge of their functional roles in cells.
Collapse
Affiliation(s)
- Anastasia Chugunova
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia.,Skolkovo Institute of Science and Technology , Skolkovo, Moscow Region 143025, Russia
| | - Tsimafei Navalayeu
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia
| | - Olga Dontsova
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia.,Skolkovo Institute of Science and Technology , Skolkovo, Moscow Region 143025, Russia
| | - Petr Sergiev
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia.,Skolkovo Institute of Science and Technology , Skolkovo, Moscow Region 143025, Russia
| |
Collapse
|
49
|
Li LJ, Leng RX, Fan YG, Pan HF, Ye DQ. Translation of noncoding RNAs: Focus on lncRNAs, pri-miRNAs, and circRNAs. Exp Cell Res 2017; 361:1-8. [PMID: 29031633 DOI: 10.1016/j.yexcr.2017.10.010] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 09/17/2017] [Accepted: 10/11/2017] [Indexed: 02/06/2023]
Abstract
Mammalian genome is pervasively transcribed, producing large number of noncoding RNAs (ncRNAs), including long noncoding RNAs (lncRNAs), primary miRNAs (pri-miRNA), and circular RNAs (circRNAs). The translation of these ncRNAs has long been overlooked. Increasing studies, however, based on ribosome profiling in various organisms provide important clues to unanticipated translation potential of lncRNAs. Moreover, a few functional peptides encoded by lncRNAs and pri-miRNAs underline the significance of their translation. Recently, several novel researches also evidence the translation of endogenous circRNAs. Given the functional significance exemplified by peptides translated by some ncRNAs and their pervasive translation, it is not too far-fetched to image that abnormal translation of ncRNAs may contribute to human diseases. Through challenging, deciphering ncRNA translation is required for comprehensive understanding of biology and medicine. In this review, we firstly present evidence concerning translation potential of lncRNAs and go on to introduce a few functional short peptides encoded by lncRNAs. Then, salient observations showing translation of pri-miRNAs and circRNAs are described in detail. We end by discussing the impact of ncRNA translation beyond producing peptides and referring briefly to the potential role of abnormal ncRNA translation in human diseases.
Collapse
Affiliation(s)
- Lian-Ju Li
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei 230032, Anhui, China; Anhui Province Key Laboratory of Major Autoimmune Diseases, Hefei 230032, Anhui, China
| | - Rui-Xue Leng
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei 230032, Anhui, China; Anhui Province Key Laboratory of Major Autoimmune Diseases, Hefei 230032, Anhui, China
| | - Yin-Guang Fan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei 230032, Anhui, China; Anhui Province Key Laboratory of Major Autoimmune Diseases, Hefei 230032, Anhui, China
| | - Hai-Feng Pan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei 230032, Anhui, China; Anhui Province Key Laboratory of Major Autoimmune Diseases, Hefei 230032, Anhui, China
| | - Dong-Qing Ye
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei 230032, Anhui, China; Anhui Province Key Laboratory of Major Autoimmune Diseases, Hefei 230032, Anhui, China.
| |
Collapse
|
50
|
The New RNA World: Growing Evidence for Long Noncoding RNA Functionality. Trends Genet 2017; 33:665-676. [DOI: 10.1016/j.tig.2017.08.002] [Citation(s) in RCA: 155] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 08/01/2017] [Accepted: 08/02/2017] [Indexed: 12/18/2022]
|