1
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
2
|
Wacholder A, Carvunis AR. Biological factors and statistical limitations prevent detection of most noncanonical proteins by mass spectrometry. PLoS Biol 2023; 21:e3002409. [PMID: 38048358 PMCID: PMC10721188 DOI: 10.1371/journal.pbio.3002409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/14/2023] [Accepted: 10/30/2023] [Indexed: 12/06/2023] Open
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry (MS) experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here, we leveraged recent advances in ribosome profiling and MS to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly expressed to be detected by shotgun MS at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for 4 noncanonical proteins in MS data, which were also supported by evolution and translation data. These results illustrate the power of MS to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly expressed proteins.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
3
|
Wacholder A, Carvunis AR. Biological Factors and Statistical Limitations Prevent Detection of Most Noncanonical Proteins by Mass Spectrometry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531963. [PMID: 36945638 PMCID: PMC10028962 DOI: 10.1101/2023.03.09.531963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here we leveraged recent advances in ribosome profiling and mass spectrometry to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly-expressed to be detected by shotgun mass spectrometry at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for four noncanonical proteins in mass spectrometry data, which were also supported by evolution and translation data. These results illustrate the power of mass spectrometry to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly-expressed proteins.
Collapse
|
4
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
5
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
6
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
7
|
An NA, Zhang J, Mo F, Luan X, Tian L, Shen QS, Li X, Li C, Zhou F, Zhang B, Ji M, Qi J, Zhou WZ, Ding W, Chen JY, Yu J, Zhang L, Shu S, Hu B, Li CY. De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nat Ecol Evol 2023; 7:264-278. [PMID: 36593289 PMCID: PMC9911349 DOI: 10.1038/s41559-022-01925-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 10/04/2022] [Indexed: 01/03/2023]
Abstract
Human de novo genes can originate from neutral long non-coding RNA (lncRNA) loci and are evolutionarily significant in general, yet how and why this all-or-nothing transition to functionality happens remains unclear. Here, in 74 human/hominoid-specific de novo genes, we identified distinctive U1 elements and RNA splice-related sequences accounting for RNA nuclear export, differentiating mRNAs from lncRNAs, and driving the origin of de novo genes from lncRNA loci. The polymorphic sites facilitating the lncRNA-mRNA conversion through regulating nuclear export are selectively constrained, maintaining a boundary that differentiates mRNAs from lncRNAs. The functional new genes actively passing through it thus showed a mode of pre-adaptive origin, in that they acquire functions along with the achievement of their coding potential. As a proof of concept, we verified the regulations of splicing and U1 recognition on the nuclear export efficiency of one of these genes, the ENSG00000205704, in human neural progenitor cells. Notably, knock-out or over-expression of this gene in human embryonic stem cells accelerates or delays the neuronal maturation of cortical organoids, respectively. The transgenic mice with ectopically expressed ENSG00000205704 showed enlarged brains with cortical expansion. We thus demonstrate the key roles of nuclear export in de novo gene origin. These newly originated genes should reflect the novel uniqueness of human brain development.
Collapse
Affiliation(s)
- Ni A. An
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jie Zhang
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Fan Mo
- grid.9227.e0000000119573309State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, China
| | - Xuke Luan
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Lu Tian
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Qing Sunny Shen
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Xiangshang Li
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chunqiong Li
- grid.510934.a0000 0005 0398 4153Chinese Institute for Brain Research, Beijing, China
| | - Fanqi Zhou
- grid.506261.60000 0001 0706 7839State Key Laboratory of Medical Molecular Biology, Key Laboratory of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing, China
| | - Boya Zhang
- grid.9227.e0000000119573309State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, China
| | - Mingjun Ji
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jianhuan Qi
- grid.9227.e0000000119573309State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, China
| | - Wei-Zhen Zhou
- grid.415105.40000 0004 9430 5605State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wanqiu Ding
- grid.11135.370000 0001 2256 9319Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jia-Yu Chen
- grid.41156.370000 0001 2314 964XState Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Jia Yu
- grid.506261.60000 0001 0706 7839State Key Laboratory of Medical Molecular Biology, Key Laboratory of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing, China
| | - Li Zhang
- grid.510934.a0000 0005 0398 4153Chinese Institute for Brain Research, Beijing, China
| | - Shaokun Shu
- grid.11135.370000 0001 2256 9319Peking University International Cancer Institute, Beijing, China
| | - Baoyang Hu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China. .,University of Chinese Academy of Sciences, Beijing, China.
| | - Chuan-Yun Li
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China. .,Chinese Institute for Brain Research, Beijing, China.
| |
Collapse
|
8
|
Parikh SB, Houghton C, Van Oss SB, Wacholder A, Carvunis A. Origins, evolution, and physiological implications of de novo genes in yeast. Yeast 2022; 39:471-481. [PMID: 35959631 PMCID: PMC9544372 DOI: 10.1002/yea.3810] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/03/2022] Open
Abstract
De novo gene birth is the process by which new genes emerge in sequences that were previously noncoding. Over the past decade, researchers have taken advantage of the power of yeast as a model and a tool to study the evolutionary mechanisms and physiological implications of de novo gene birth. We summarize the mechanisms that have been proposed to explicate how noncoding sequences can become protein-coding genes, highlighting the discovery of pervasive translation of the yeast transcriptome and its presumed impact on evolutionary innovation. We summarize current best practices for the identification and characterization of de novo genes. Crucially, we explain that the field is still in its nascency, with the physiological roles of most young yeast de novo genes identified thus far still utterly unknown. We hope this review inspires researchers to investigate the true contribution of de novo gene birth to cellular physiology and phenotypic diversity across yeast strains and species.
Collapse
Affiliation(s)
- Saurin B. Parikh
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - S. Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Anne‐Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| |
Collapse
|
9
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
10
|
Song H, Guo Z, Zhang X, Sui J. De novo genes in Arachis hypogaea cv. Tifrunner: systematic identification, molecular evolution, and potential contributions to cultivated peanut. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1081-1095. [PMID: 35748398 DOI: 10.1111/tpj.15875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 06/15/2022] [Accepted: 06/21/2022] [Indexed: 06/15/2023]
Abstract
De novo genes are derived from non-coding sequences, and they can play essential roles in organisms. Cultivated peanut (Arachis hypogaea) is a major oil and protein crop derived from a cross between Arachis duranensis and Arachis ipaensis. However, few de novo genes have been documented in Arachis. Here, we identified 381 de novo genes in A. hypogaea cv. Tifrunner based on comparison with five closely related Arachis species. There are distinct differences in gene expression patterns and gene structures between conserved and de novo genes. The identified de novo genes originated from ancestral sequence regions associated with metabolic and biosynthetic processes, and they were subsequently integrated into existing regulatory networks. De novo paralogs and homoeologs were identified in A. hypogaea cv. Tifrunner. De novo paralogs and homoeologs with conserved expression have mismatching cis-acting elements under normal growth conditions. De novo genes potentially have pluripotent functions in responses to biotic stresses as well as in growth and development based on quantitative trait locus data. This work provides a foundation for future research examining gene birth processes and gene function in Arachis and related taxa.
Collapse
Affiliation(s)
- Hui Song
- Grassland Agri-husbandry Research Center, College of Grassland Science, Qingdao Agricultural University, Qingdao, China
| | - Zhonglong Guo
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Life Sciences and School of Advanced Agricultural Sciences, Peking University, Beijing, China
| | - Xiaojun Zhang
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| | - Jiongming Sui
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| |
Collapse
|
11
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
12
|
Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel) 2021; 12:1913. [PMID: 34946861 PMCID: PMC8702183 DOI: 10.3390/genes12121913] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
We study the potential for the de novo evolution of genes from random nucleotide sequences using libraries of E. coli expressing random sequence peptides. We assess the effects of such peptides on cell growth by monitoring frequency changes in individual clones in a complex library through four serial passages. Using a new analysis pipeline that allows the tracing of peptides of all lengths, we find that over half of the peptides have consistent effects on cell growth. Across nine different experiments, around 16% of clones increase in frequency and 36% decrease, with some variation between individual experiments. Shorter peptides (8-20 residues), are more likely to increase in frequency, longer ones are more likely to decrease. GC content, amino acid composition, intrinsic disorder, and aggregation propensity show slightly different patterns between peptide groups. Sequences that increase in frequency tend to be more disordered with lower aggregation propensity. This coincides with the observation that young genes with more disordered structures are better tolerated in genomes. Our data indicate that random sequences can be a source of evolutionary innovation, since a large fraction of them are well tolerated by the cells or can provide a growth advantage.
Collapse
Affiliation(s)
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology, August-Thienemann Strasse 2, 24306 Plön, Germany;
| |
Collapse
|
13
|
Abrams MB, Dubin CA, AlZaben F, Bravo J, Joubert PM, Weiss CV, Brem RB. Population and comparative genetics of thermotolerance divergence between yeast species. G3 (BETHESDA, MD.) 2021; 11:jkab139. [PMID: 33914073 PMCID: PMC8495929 DOI: 10.1093/g3journal/jkab139] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 04/20/2021] [Indexed: 12/04/2022]
Abstract
Many familiar traits in the natural world-from lions' manes to the longevity of bristlecone pine trees-arose in the distant past, and have long since fixed in their respective species. A key challenge in evolutionary genetics is to figure out how and why species-defining traits have come to be. We used the thermotolerance growth advantage of the yeast Saccharomyces cerevisiae over its sister species Saccharomyces paradoxus as a model for addressing these questions. Analyzing loci at which the S. cerevisiae allele promotes thermotolerance, we detected robust evidence for positive selection, including amino acid divergence between the species and conservation within S. cerevisiae populations. Because such signatures were particularly strong at the chromosome segregation gene ESP1, we used this locus as a case study for focused mechanistic follow-up. Experiments revealed that, in culture at high temperature, the S. paradoxus ESP1 allele conferred a qualitative defect in biomass accumulation and cell division relative to the S. cerevisiae allele. Only genetic divergence in the ESP1 coding region mattered phenotypically, with no functional impact detectable from the promoter. Our data support a model in which an ancient ancestor of S. cerevisiae, under selection to boost viability at high temperature, acquired amino acid variants at ESP1 and many other loci, which have been constrained since then. Complex adaptations of this type hold promise as a paradigm for interspecies genetics, especially in deeply diverged traits that may have taken millions of years to evolve.
Collapse
Affiliation(s)
- Melanie B Abrams
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Claire A Dubin
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Faisal AlZaben
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Juan Bravo
- Graduate Program in the Biology of Aging, University of Southern California, Los Angeles, CA 90095, USA
| | - Pierre M Joubert
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Carly V Weiss
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Rachel B Brem
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Buck Institute for Research on Aging, Novato, CA 94945, USA
| |
Collapse
|
14
|
Hata T, Satoh S, Takada N, Matsuo M, Obokata J. Kozak Sequence Acts as a Negative Regulator for De Novo Transcription Initiation of Newborn Coding Sequences in the Plant Genome. Mol Biol Evol 2021; 38:2791-2803. [PMID: 33705557 PMCID: PMC8233501 DOI: 10.1093/molbev/msab069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The manner in which newborn coding sequences and their transcriptional competency emerge during the process of gene evolution remains unclear. Here, we experimentally simulated eukaryotic gene origination processes by mimicking horizontal gene transfer events in the plant genome. We mapped the precise position of the transcription start sites (TSSs) of hundreds of newly introduced promoterless firefly luciferase (LUC) coding sequences in the genome of Arabidopsis thaliana cultured cells. The systematic characterization of the LUC-TSSs revealed that 80% of them occurred under the influence of endogenous promoters, while the remainder underwent de novo activation in the intergenic regions, starting from pyrimidine-purine dinucleotides. These de novo TSSs obeyed unexpected rules; they predominantly occurred ∼100 bp upstream of the LUC inserts and did not overlap with Kozak-containing putative open reading frames (ORFs). These features were the output of the immediate responses to the sequence insertions, rather than a bias in the screening of the LUC gene function. Regarding the wild-type genic TSSs, they appeared to have evolved to lack any ORFs in their vicinities. Therefore, the repulsion by the de novo TSSs of Kozak-containing ORFs described above might be the first selection gate for the occurrence and evolution of TSSs in the plant genome. Based on these results, we characterized the de novo type of TSS identified in the plant genome and discuss its significance in genome evolution.
Collapse
Affiliation(s)
- Takayuki Hata
- Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Sakyo-ku, Kyoto, Kyoto, Japan
- Faculty of Agriculture, Setsunan University, Hirakata, Osaka, Japan
| | - Soichirou Satoh
- Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Sakyo-ku, Kyoto, Kyoto, Japan
| | - Naoto Takada
- Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Sakyo-ku, Kyoto, Kyoto, Japan
| | - Mitsuhiro Matsuo
- Faculty of Agriculture, Setsunan University, Hirakata, Osaka, Japan
| | - Junichi Obokata
- Faculty of Agriculture, Setsunan University, Hirakata, Osaka, Japan
| |
Collapse
|
15
|
Hata T, Takada N, Hayakawa C, Kazama M, Uchikoba T, Tachikawa M, Matsuo M, Satoh S, Obokata J. De novo activated transcription of inserted foreign coding sequences is inheritable in the plant genome. PLoS One 2021; 16:e0252674. [PMID: 34111139 PMCID: PMC8191969 DOI: 10.1371/journal.pone.0252674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 05/19/2021] [Indexed: 01/16/2023] Open
Abstract
The manner in which inserted foreign coding sequences become transcriptionally activated and fixed in the plant genome is poorly understood. To examine such processes of gene evolution, we performed an artificial evolutionary experiment in Arabidopsis thaliana. As a model of gene-birth events, we introduced a promoterless coding sequence of the firefly luciferase (LUC) gene and established 386 T2-generation transgenic lines. Among them, we determined the individual LUC insertion loci in 76 lines and found that one-third of them were transcribed de novo even in the intergenic or inherently unexpressed regions. In the transcribed lines, transcription-related chromatin marks were detected across the newly activated transcribed regions. These results agreed with our previous findings in A. thaliana cultured cells under a similar experimental scheme. A comparison of the results of the T2-plant and cultured cell experiments revealed that the de novo-activated transcription concomitant with local chromatin remodelling was inheritable. During one-generation inheritance, it seems likely that the transcription activities of the LUC inserts trapped by the endogenous genes/transcripts became stronger, while those of de novo transcription in the intergenic/untranscribed regions became weaker. These findings may offer a clue for the elucidation of the mechanism by which inserted foreign coding sequences become transcriptionally activated and fixed in the plant genome.
Collapse
Affiliation(s)
- Takayuki Hata
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
- Faculty of Agriculture, Setsunan University, Hirakata-shi, Osaka, Japan
| | - Naoto Takada
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Chihiro Hayakawa
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Mei Kazama
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Tomohiro Uchikoba
- Faculty of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Makoto Tachikawa
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Mitsuhiro Matsuo
- Faculty of Agriculture, Setsunan University, Hirakata-shi, Osaka, Japan
| | - Soichirou Satoh
- Graduate School of Life and Environfmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
- Faculty of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto-shi, Kyoto, Japan
| | - Junichi Obokata
- Faculty of Agriculture, Setsunan University, Hirakata-shi, Osaka, Japan
| |
Collapse
|
16
|
Kosinski LJ, Masel J. Readthrough Errors Purge Deleterious Cryptic Sequences, Facilitating the Birth of Coding Sequences. Mol Biol Evol 2021; 37:1761-1774. [PMID: 32101291 DOI: 10.1093/molbev/msaa046] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
De novo protein-coding innovations sometimes emerge from ancestrally noncoding DNA, despite the expectation that translating random sequences is overwhelmingly likely to be deleterious. The "preadapting selection" hypothesis claims that emergence is facilitated by prior, low-level translation of noncoding sequences via molecular errors. It predicts that selection on polypeptides translated only in error is strong enough to matter and is strongest when erroneous expression is high. To test this hypothesis, we examined noncoding sequences located downstream of stop codons (i.e., those potentially translated by readthrough errors) in Saccharomyces cerevisiae genes. We identified a class of "fragile" proteins under strong selection to reduce readthrough, which are unlikely substrates for co-option. Among the remainder, sequences showing evidence of readthrough translation, as assessed by ribosome profiling, encoded C-terminal extensions with higher intrinsic structural disorder, supporting the preadapting selection hypothesis. The cryptic sequences beyond the stop codon, rather than spillover effects from the regular C-termini, are primarily responsible for the higher disorder. Results are robust to controlling for the fact that stronger selection also reduces the length of C-terminal extensions. These findings indicate that selection acts on 3' UTRs in Saccharomyces cerevisiae to purge potentially deleterious variants of cryptic polypeptides, acting more strongly in genes that experience more readthrough errors.
Collapse
Affiliation(s)
- Luke J Kosinski
- Molecular and Cellular Biology, University of Arizona, Tucson, AZ
| | - Joanna Masel
- Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ
| |
Collapse
|
17
|
Parikh SB, Castilho Coelho N, Carvunis AR. LI Detector: a framework for sensitive colony-based screens regardless of the distribution of fitness effects. G3-GENES GENOMES GENETICS 2021; 11:6161305. [PMID: 33693606 PMCID: PMC8022918 DOI: 10.1093/g3journal/jkaa068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/15/2020] [Indexed: 11/13/2022]
Abstract
Microbial growth characteristics have long been used to investigate fundamental questions of biology. Colony-based high-throughput screens enable parallel fitness estimation of thousands of individual strains using colony growth as a proxy for fitness. However, fitness estimation is complicated by spatial biases affecting colony growth, including uneven nutrient distribution, agar surface irregularities, and batch effects. Analytical methods that have been developed to correct for these spatial biases rely on the following assumptions: (1) that fitness effects are normally distributed, and (2) that most genetic perturbations lead to minor changes in fitness. Although reasonable for many applications, these assumptions are not always warranted and can limit the ability to detect small fitness effects. Beneficial fitness effects, in particular, are notoriously difficult to detect under these assumptions. Here, we developed the linear interpolation-based detector (LI Detector) framework to enable sensitive colony-based screening without making prior assumptions about the underlying distribution of fitness effects. The LI Detector uses a grid of reference colonies to assign a relative fitness value to every colony on the plate. We show that the LI Detector is effective in correcting for spatial biases and equally sensitive toward increase and decrease in fitness. LI Detector offers a tunable system that allows the user to identify small fitness effects with unprecedented sensitivity and specificity. LI Detector can be utilized to develop and refine gene-gene and gene-environment interaction networks of colony-forming organisms, including yeast, by increasing the range of fitness effects that can be reliably detected.
Collapse
Affiliation(s)
- Saurin Bipin Parikh
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| |
Collapse
|
18
|
Uncovering de novo gene birth in yeast using deep transcriptomics. Nat Commun 2021; 12:604. [PMID: 33504782 PMCID: PMC7841160 DOI: 10.1038/s41467-021-20911-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 01/04/2021] [Indexed: 01/30/2023] Open
Abstract
De novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes.
Collapse
|
19
|
Puntambekar S, Newhouse R, San-Miguel J, Chauhan R, Vernaz G, Willis T, Wayland MT, Umrania Y, Miska EA, Prabakaran S. Evolutionary divergence of novel open reading frames in cichlids speciation. Sci Rep 2020; 10:21570. [PMID: 33299045 PMCID: PMC7726158 DOI: 10.1038/s41598-020-78555-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/26/2020] [Indexed: 01/02/2023] Open
Abstract
Novel open reading frames (nORFs) with coding potential may arise from noncoding DNA. Not much is known about their emergence, functional role, fixation in a population or contribution to adaptive radiation. Cichlids fishes exhibit extensive phenotypic diversification and speciation. Encounters with new environments alone are not sufficient to explain this striking diversity of cichlid radiation because other taxa coexistent with the Cichlidae demonstrate lower species richness. Wagner et al. analyzed cichlid diversification in 46 African lakes and reported that both extrinsic environmental factors and intrinsic lineage-specific traits related to sexual selection have strongly influenced the cichlid radiation, which indicates the existence of unknown molecular mechanisms responsible for rapid phenotypic diversification, such as emergence of novel open reading frames (nORFs). In this study, we integrated transcriptomic and proteomic signatures from two tissues of two cichlids species, identified nORFs and performed evolutionary analysis on these nORF regions. Our results suggest that the time scale of speciation of the two species and evolutionary divergence of these nORF genomic regions are similar and indicate a potential role for these nORFs in speciation of the cichlid fishes.
Collapse
Affiliation(s)
- Shraddha Puntambekar
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | - Rachel Newhouse
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Jaime San-Miguel
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Ruchi Chauhan
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Grégoire Vernaz
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
- The Wellcome Trust/CRUK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Thomas Willis
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Matthew T Wayland
- Department of Zoology, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Yagnesh Umrania
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Eric A Miska
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Sudhakaran Prabakaran
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India.
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
- St. Edmund's College, University of Cambridge, Cambridge, CB3 0BN, UK.
| |
Collapse
|
20
|
Evolution of novel genes in three-spined stickleback populations. Heredity (Edinb) 2020; 125:50-59. [PMID: 32499660 PMCID: PMC7413265 DOI: 10.1038/s41437-020-0319-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 04/27/2020] [Accepted: 04/30/2020] [Indexed: 12/22/2022] Open
Abstract
Eukaryotic genomes frequently acquire new protein-coding genes which may significantly impact an organism’s fitness. Novel genes can be created, for example, by duplication of large genomic regions or de novo, from previously non-coding DNA. Either way, creation of a novel transcript is an essential early step during novel gene emergence. Most studies on the gain-and-loss dynamics of novel genes so far have compared genomes between species, constraining analyses to genes that have remained fixed over long time scales. However, the importance of novel genes for rapid adaptation among populations has recently been shown. Therefore, since little is known about the evolutionary dynamics of transcripts across natural populations, we here study transcriptomes from several tissues and nine geographically distinct populations of an ecological model species, the three-spined stickleback. Our findings suggest that novel genes typically start out as transcripts with low expression and high tissue specificity. Early expression regulation appears to be mediated by gene-body methylation. Although most new and narrowly expressed genes are rapidly lost, those that survive and subsequently spread through populations tend to gain broader and higher expression levels. The properties of the encoded proteins, such as disorder and aggregation propensity, hardly change. Correspondingly, young novel genes are not preferentially under positive selection but older novel genes more often overlap with FST outlier regions. Taken together, expression of the surviving novel genes is rapidly regulated, probably via epigenetic mechanisms, while structural properties of encoded proteins are non-debilitating and might only change much later.
Collapse
|
21
|
Divergence of Peroxisome Membrane Gene Sequence and Expression Between Yeast Species. G3-GENES GENOMES GENETICS 2020; 10:2079-2085. [PMID: 32317271 PMCID: PMC7263690 DOI: 10.1534/g3.120.401304] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Large population-genomic sequencing studies can enable highly-powered analyses of sequence signatures of natural selection. Genome repositories now available for Saccharomyces yeast make it a premier model for studies of the molecular mechanisms of adaptation. We mined the genomes of hundreds of isolates of the sister species S. cerevisiae and S. paradoxus to identify sequence hallmarks of adaptive divergence between the two. From the top hits we focused on a set of genes encoding membrane proteins of the peroxisome, an organelle devoted to lipid breakdown and other specialized metabolic pathways. In-depth population- and comparative-genomic sequence analyses of these genes revealed striking divergence between S. cerevisiae and S. paradoxus. And from transcriptional profiles we detected non-neutral, directional cis-regulatory variation at the peroxisome membrane genes, with overall high expression in S. cerevisiae relative to S. paradoxus. Taken together, these data support a model in which yeast species have differentially tuned the expression of peroxisome components to boost their fitness in distinct niches.
Collapse
|
22
|
Hallin J, Cisneros AF, Hénault M, Fijarczyk A, Dandage R, Bautista C, Landry CR. Similarities in biological processes can be used to bridge ecology and molecular biology. Evol Appl 2020; 13:1335-1350. [PMID: 32684962 PMCID: PMC7359829 DOI: 10.1111/eva.12961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 02/17/2020] [Accepted: 03/16/2020] [Indexed: 01/10/2023] Open
Abstract
Much of the research in biology aims to understand the origin of diversity. Naturally, ecological diversity was the first object of study, but we now have the necessary tools to probe diversity at molecular scales. The inherent differences in how we study diversity at different scales caused the disciplines of biology to be organized around these levels, from molecular biology to ecology. Here, we illustrate that there are key properties of each scale that emerge from the interactions of simpler components and that these properties are often shared across different levels of organization. This means that ideas from one level of organization can be an inspiration for novel hypotheses to study phenomena at another level. We illustrate this concept with examples of events at the molecular level that have analogs at the organismal or ecological level and vice versa. Through these examples, we illustrate that biological processes at different organization levels are governed by general rules. The study of the same phenomena at different scales could enrich our work through a multidisciplinary approach, which should be a staple in the training of future scientists.
Collapse
Affiliation(s)
- Johan Hallin
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Angel F Cisneros
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Mathieu Hénault
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Anna Fijarczyk
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Rohan Dandage
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Carla Bautista
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| | - Christian R Landry
- Département de biochimie de microbiologie et de bio-informatique Faculté des sciences et de génie Université Laval Québec Canada.,Département de biologie Faculté des sciences et de génie Université Laval Québec Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Canada.,PROTEO Le réseau québécois de recherche sur la fonction la structure et l'ingénierie des protéines Université Laval Québec Canada.,Centre de Recherche en Données Massives (CRDM) Université Laval Québec Canada
| |
Collapse
|
23
|
Ruiz-Orera J, Villanueva-Cañas JL, Albà MM. Evolution of new proteins from translated sORFs in long non-coding RNAs. Exp Cell Res 2020; 391:111940. [PMID: 32156600 DOI: 10.1016/j.yexcr.2020.111940] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 02/26/2020] [Accepted: 03/02/2020] [Indexed: 01/07/2023]
Abstract
High throughput RNA sequencing techniques have revealed that a large fraction of the genome is transcribed into long non-coding RNAs (lncRNAs). Unlike canonical protein-coding genes, lncRNAs do not contain long open reading frames (ORFs) and tend to be poorly conserved across species. However, many of them contain small ORFs (sORFs) that exhibit translation signatures according to ribosome profiling or proteomics data. These sORFs are a source of putative novel proteins; some of them may confer a selective advantage and be maintained over time, a process known as de novo gene birth. Here we review the mechanisms by which randomly occurring sORFs in lncRNAs can become new functional proteins.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | | | - M Mar Albà
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Hospital Del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain; Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, 08010, Spain.
| |
Collapse
|
24
|
Vakirlis N, Acar O, Hsu B, Castilho Coelho N, Van Oss SB, Wacholder A, Medetgul-Ernar K, Bowman RW, Hines CP, Iannotta J, Parikh SB, McLysaght A, Camacho CJ, O'Donnell AF, Ideker T, Carvunis AR. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat Commun 2020; 11:781. [PMID: 32034123 PMCID: PMC7005711 DOI: 10.1038/s41467-020-14500-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 12/20/2019] [Indexed: 11/14/2022] Open
Abstract
Recent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection. There is increasing evidence that protein-coding genes can emerge de novo from noncoding genomic regions. Vakirlis et al. propose that sequences encoding transmembrane polypeptides can emerge de novo in thymine-rich genomic regions and provide organisms with fitness benefits.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Brian Hsu
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - S Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Kate Medetgul-Ernar
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Ray W Bowman
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States
| | - Cameron P Hines
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - John Iannotta
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Carlos J Camacho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Allyson F O'Donnell
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States.
| | - Trey Ideker
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States.
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.
| |
Collapse
|
25
|
Witt E, Benjamin S, Svetec N, Zhao L. Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila. eLife 2019; 8:e47138. [PMID: 31418408 PMCID: PMC6697446 DOI: 10.7554/elife.47138] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 07/06/2019] [Indexed: 12/25/2022] Open
Abstract
The testis is a peculiar tissue in many respects. It shows patterns of rapid gene evolution and provides a hotspot for the origination of genetic novelties such as de novo genes, duplications and mutations. To investigate the expression patterns of genetic novelties across cell types, we performed single-cell RNA-sequencing of adult Drosophila testis. We found that new genes were expressed in various cell types, the patterns of which may be influenced by their mode of origination. In particular, lineage-specific de novo genes are commonly expressed in early spermatocytes, while young duplicated genes are often bimodally expressed. Analysis of germline substitutions suggests that spermatogenesis is a highly reparative process, with the mutational load of germ cells decreasing as spermatogenesis progresses. By elucidating the distribution of genetic novelties across spermatogenesis, this study provides a deeper understanding of how the testis maintains its core reproductive function while being a hotbed of evolutionary innovation.
Collapse
Affiliation(s)
- Evan Witt
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| | - Sigi Benjamin
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| | - Nicolas Svetec
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| | - Li Zhao
- Laboratory of Evolutionary Genetics and GenomicsThe Rockefeller UniversityNew YorkUnited States
| |
Collapse
|
26
|
Nielly-Thibault L, Landry CR. Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases. Genetics 2019; 212:1353-1366. [PMID: 31227545 PMCID: PMC6707459 DOI: 10.1534/genetics.119.302187] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/14/2019] [Indexed: 12/03/2022] Open
Abstract
Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.
Collapse
Affiliation(s)
- Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| |
Collapse
|