1
|
Song H, Guo Z, Zhang X, Sui J. De novo genes in Arachis hypogaea cv. Tifrunner: systematic identification, molecular evolution, and potential contributions to cultivated peanut. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1081-1095. [PMID: 35748398 DOI: 10.1111/tpj.15875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 06/15/2022] [Accepted: 06/21/2022] [Indexed: 06/15/2023]
Abstract
De novo genes are derived from non-coding sequences, and they can play essential roles in organisms. Cultivated peanut (Arachis hypogaea) is a major oil and protein crop derived from a cross between Arachis duranensis and Arachis ipaensis. However, few de novo genes have been documented in Arachis. Here, we identified 381 de novo genes in A. hypogaea cv. Tifrunner based on comparison with five closely related Arachis species. There are distinct differences in gene expression patterns and gene structures between conserved and de novo genes. The identified de novo genes originated from ancestral sequence regions associated with metabolic and biosynthetic processes, and they were subsequently integrated into existing regulatory networks. De novo paralogs and homoeologs were identified in A. hypogaea cv. Tifrunner. De novo paralogs and homoeologs with conserved expression have mismatching cis-acting elements under normal growth conditions. De novo genes potentially have pluripotent functions in responses to biotic stresses as well as in growth and development based on quantitative trait locus data. This work provides a foundation for future research examining gene birth processes and gene function in Arachis and related taxa.
Collapse
Affiliation(s)
- Hui Song
- Grassland Agri-husbandry Research Center, College of Grassland Science, Qingdao Agricultural University, Qingdao, China
| | - Zhonglong Guo
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Life Sciences and School of Advanced Agricultural Sciences, Peking University, Beijing, China
| | - Xiaojun Zhang
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| | - Jiongming Sui
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| |
Collapse
|
2
|
Baverstock K. Commentary on: Cause of Cambrian explosion - Terrestrial or Cosmic? PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2018; 136:25-26. [PMID: 29549027 DOI: 10.1016/j.pbiomolbio.2018.03.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Keith Baverstock
- Department of Environmental and Biological Sciences, University of Eastern Finland, Kuopio Campus, Kuopio, Finland.
| |
Collapse
|
3
|
Gubala AM, Schmitz JF, Kearns MJ, Vinh TT, Bornberg-Bauer E, Wolfner MF, Findlay GD. The Goddard and Saturn Genes Are Essential for Drosophila Male Fertility and May Have Arisen De Novo. Mol Biol Evol 2017; 34:1066-1082. [PMID: 28104747 DOI: 10.1093/molbev/msx057] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
New genes arise through a variety of mechanisms, including the duplication of existing genes and the de novo birth of genes from noncoding DNA sequences. While there are numerous examples of duplicated genes with important functional roles, the functions of de novo genes remain largely unexplored. Many newly evolved genes are expressed in the male reproductive tract, suggesting that these evolutionary innovations may provide advantages to males experiencing sexual selection. Using testis-specific RNA interference, we screened 11 putative de novo genes in Drosophila melanogaster for effects on male fertility and identified two, goddard and saturn, that are essential for spermatogenesis and sperm function. Goddard knockdown (KD) males fail to produce mature sperm, while saturn KD males produce few sperm, and these function inefficiently once transferred to females. Consistent with a de novo origin, both genes are identifiable only in Drosophila and are predicted to encode proteins with no sequence similarity to any annotated protein. However, since high levels of divergence prevented the unambiguous identification of the noncoding sequences from which each gene arose, we consider goddard and saturn to be putative de novo genes. Within Drosophila, both genes have been lost in certain lineages, but show conserved, male-specific patterns of expression in the species in which they are found. Goddard is consistently found in single-copy and evolves under purifying selection. In contrast, saturn has diversified through gene duplication and positive selection. These data suggest that de novo genes can acquire essential roles in male reproduction.
Collapse
Affiliation(s)
- Anna M Gubala
- Department of Biology, College of the Holy Cross, Worcester, MA
| | - Jonathan F Schmitz
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | | | - Tery T Vinh
- Department of Biology, College of the Holy Cross, Worcester, MA
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Mariana F Wolfner
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| | - Geoffrey D Findlay
- Department of Biology, College of the Holy Cross, Worcester, MA.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| |
Collapse
|
4
|
McLysaght A, Guerzoni D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140332. [PMID: 26323763 PMCID: PMC4571571 DOI: 10.1098/rstb.2014.0332] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations.
Collapse
Affiliation(s)
- Aoife McLysaght
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| | - Daniele Guerzoni
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| |
Collapse
|
5
|
Guerzoni D, McLysaght A. De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting. Genome Biol Evol 2016; 8:1222-32. [PMID: 27056411 PMCID: PMC4860702 DOI: 10.1093/gbe/evw074] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
De novo protein-coding gene origination is increasingly recognized as an important evolutionary mechanism. However, there remains a large amount of uncertainty regarding the frequency of these events and the mechanisms and speed of gene establishment. Here, we describe a rigorous search for cases of de novo gene origination in the great apes. We analyzed annotated proteomes as well as full genomic DNA and transcriptional and translational evidence. It is notable that results vary between database updates due to the fluctuating annotation of these genes. Nonetheless we identified 35 de novo genes: 16 human-specific; 5 human and chimpanzee specific; and 14 that originated prior to the divergence of human, chimpanzee, and gorilla and are found in all three genomes. The taxonomically restricted distribution of these genes cannot be explained by loss in other lineages. Each gene is supported by an open reading frame-creating mutation that occurred within the primate lineage, and which is not polymorphic in any species. Similarly to previous studies we find that the de novo genes identified are short and frequently located near pre-existing genes. Also, they may be associated with Alu elements and prior transcription and RNA-splicing at the locus. Additionally, we report the first case of apparent independent lineage sorting of a de novo gene. The gene is present in human and gorilla, whereas chimpanzee has the ancestral noncoding sequence. This indicates a long period of polymorphism prior to fixation and thus supports a model where de novo genes may, at least initially, have a neutral effect on fitness.
Collapse
Affiliation(s)
- Daniele Guerzoni
- Smurfit Institute of Genetics, Department of Genetics, Trinity College Dublin, University of Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Department of Genetics, Trinity College Dublin, University of Dublin, Ireland
| |
Collapse
|
6
|
Carelli FN, Hayakawa T, Go Y, Imai H, Warnefors M, Kaessmann H. The life history of retrocopies illuminates the evolution of new mammalian genes. Genome Res 2016; 26:301-14. [PMID: 26728716 PMCID: PMC4772013 DOI: 10.1101/gr.198473.115] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 12/21/2015] [Indexed: 02/03/2023]
Abstract
New genes contribute substantially to adaptive evolutionary innovation, but the functional evolution of new mammalian genes has been little explored at a broad scale. Previous work established mRNA-derived gene duplicates, known as retrocopies, as models for the study of new gene origination. Here we combine mammalian transcriptomic and epigenomic data to unveil the processes underlying the evolution of stripped-down retrocopies into complex new genes. We show that although some robustly expressed retrocopies are transcribed from preexisting promoters, most evolved new promoters from scratch or recruited proto-promoters in their genomic vicinity. In particular, many retrocopy promoters emerged from ancestral enhancers (or bivalent regulatory elements) or are located in CpG islands not associated with other genes. We detected 88–280 selectively preserved retrocopies per mammalian species, illustrating that these mechanisms facilitated the birth of many functional retrogenes during mammalian evolution. The regulatory evolution of originally monoexonic retrocopies was frequently accompanied by exon gain, which facilitated co-option of distant promoters and allowed expression of alternative isoforms. While young retrogenes are often initially expressed in the testis, increased regulatory and structural complexities allowed retrogenes to functionally diversify and evolve somatic organ functions, sometimes as complex as those of their parents. Thus, some retrogenes evolved the capacity to temporarily substitute for their parents during the process of male meiotic X inactivation, while others rendered parental functions superfluous, allowing for parental gene loss. Overall, our reconstruction of the “life history” of mammalian retrogenes highlights retroposition as a general model for understanding new gene birth and functional evolution.
Collapse
Affiliation(s)
- Francesco Nicola Carelli
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Takashi Hayakawa
- Department of Wildlife Science (Nagoya Railroad Company, Limited), Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan; Japan Monkey Center, Inuyama, Aichi 484-0081, Japan
| | - Yasuhiro Go
- Department of Brain Sciences, Center for Novel Science Initiatives, National Institutes of Natural Sciences, Okazaki, Aichi 444-8585, Japan; Department of Developmental Physiology, National Institute for Physiological Sciences, Okazaki, Aichi 444-8585, Japan; Department of Physiological Sciences, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Aichi 484-8585, Japan
| | - Hiroo Imai
- Department of Cellular and Molecular Biology, Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Maria Warnefors
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Henrik Kaessmann
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
7
|
Zhang YE, Long M. New genes contribute to genetic and phenotypic novelties in human evolution. Curr Opin Genet Dev 2014; 29:90-6. [PMID: 25218862 PMCID: PMC4631527 DOI: 10.1016/j.gde.2014.08.013] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Revised: 08/26/2014] [Accepted: 08/27/2014] [Indexed: 12/31/2022]
Abstract
New genes in human genomes have been found relevant in evolution and biology of humans. It was conservatively estimated that the human genome encodes more than 300 human-specific genes and 1000 primate-specific genes. These new arrivals appear to be implicated in brain function and male reproduction. Surprisingly, increasing evidence indicates that they may also bring negative pleiotropic effects, while assuming various possible biological functions as sources of phenotypic novelties, suggesting a non-progressive route for functional evolution. Similar to these fixed new genes, polymorphic new genes were found to contribute to functional evolution within species, for example, with respect to digestion or disease resistance, revealing that new genes can acquire new or diverged functions in its initial stage as prototypic genes. These progresses have provided new opportunities to explore the genetic basis of human biology and human evolutionary history in a new dimension.
Collapse
Affiliation(s)
- Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, USA.
| |
Collapse
|
8
|
A survey of innovation through duplication in the reduced genomes of twelve parasites. PLoS One 2014; 9:e99213. [PMID: 24919110 PMCID: PMC4053351 DOI: 10.1371/journal.pone.0099213] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 05/12/2014] [Indexed: 11/24/2022] Open
Abstract
We characterize the prevalence, distribution, divergence, and putative functions of detectable two-copy paralogs and segmental duplications in the Apicomplexa, a phylum of parasitic protists. Apicomplexans are mostly obligate intracellular parasites responsible for human and animal diseases (e.g. malaria and toxoplasmosis). Gene loss is a major force in the phylum. Genomes are small and protein-encoding gene repertoires are reduced. Despite this genomic streamlining, duplications and gene family amplifications are present. The potential for innovation introduced by duplications is of particular interest. We compared genomes of twelve apicomplexans across four lineages and used orthology and genome cartography to map distributions of duplications against genome architectures. Segmental duplications appear limited to five species. Where present, they correspond to regions enriched for multi-copy and species-specific genes, pointing toward roles in adaptation and innovation. We found a phylum-wide association of duplications with dynamic chromosome regions and syntenic breakpoints. Trends in the distribution of duplicated genes indicate that recent, species-specific duplicates are often tandem while most others have been dispersed by genome rearrangements. These trends show a relationship between genome architecture and gene duplication. Functional analysis reveals: proteases, which are vital to a parasitic lifecycle, to be prominent in putative recent duplications; a pair of paralogous genes in Toxoplasma gondii previously shown to produce the rate-limiting step in dopamine synthesis in mammalian cells, a possible link to the modification of host behavior; and phylum-wide differences in expression and subcellular localization, indicative of modes of divergence. We have uncovered trends in multiple modes of duplicate divergence including sequence, intron content, expression, subcellular localization, and functions of putative recent duplicates that highlight the role of duplications in the continuum of forces that have shaped these genomes.
Collapse
|
9
|
Marques AC, Ponting CP. Intergenic lncRNAs and the evolution of gene expression. Curr Opin Genet Dev 2014; 27:48-53. [PMID: 24852186 DOI: 10.1016/j.gde.2014.03.009] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 03/19/2014] [Accepted: 03/19/2014] [Indexed: 11/24/2022]
Abstract
Eukaryote genomes encode a surprisingly large number of noncoding transcripts. Around two-thirds of human transcribed loci do not encode protein, and many are intergenic and produce long (>200 nucleotides) noncoding RNAs (lncRNAs). Extensive analyses using comparative genomics and transcriptomics approaches have established that lncRNA sequence and transcription tend to turn over rapidly during evolution. Our appreciation of the biological roles of lncRNAs, based only on a handful of transcripts with well-characterized functions, is that lncRNAs have diverse roles in regulating gene expression. These proposed roles together with their rapid rates of evolution suggest that lncRNAs could contribute to the divergent expression patterns observed among species and potentially to the origin of new traits.
Collapse
Affiliation(s)
- Ana C Marques
- MRC Functional Genomics Unit, University of Oxford, South Parks Road, OX1 3QX, UK; University of Oxford, Department of Physiology, Anatomy and Genetics, South Parks Road, OX1 3QX, UK.
| | - Chris P Ponting
- MRC Functional Genomics Unit, University of Oxford, South Parks Road, OX1 3QX, UK; University of Oxford, Department of Physiology, Anatomy and Genetics, South Parks Road, OX1 3QX, UK
| |
Collapse
|
10
|
Are proposed early genetic codes capable of encoding viable proteins? J Mol Evol 2014; 78:263-74. [PMID: 24826911 DOI: 10.1007/s00239-014-9622-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 04/28/2014] [Indexed: 01/10/2023]
Abstract
Proteins are elaborate biopolymers balancing between contradicting intrinsic propensities to fold, aggregate, or remain disordered. Assessing their primary structural preferences observable without evolutionary optimization has been reinforced by the recent identification of de novo proteins that have emerged from previously non-coding sequences. In this paper we investigate structural preferences of hypothetical proteins translated from random DNA segments using the standard genetic code and three of its proposed evolutionarily predecessor models encoding 10, 6, and 4 amino acids, respectively. Our only main assumption is that the disorder, aggregation, and transmembrane helix predictions used are able to reflect the differences in the trends of the protein sets investigated. We found that the 10-residue code encodes proteins that resemble modern proteins in their predicted structural properties. All of the investigated early genetic codes give rise to proteins with enhanced disorder and diminished aggregation propensities. Our results suggest that an ancestral genetic code similar to the proposed 10-residue one is capable of encoding functionally diverse proteins but these might have existed under conditions different from today's common physiological ones. The existence of a protein functional repertoire for the investigated earlier stages which is quite distinct as it is today can be deduced from the presented results.
Collapse
|
11
|
Abstract
A single transcript sometimes codes for more than one product. In bacteria, and in a few exceptional animal lineages, many genes are organized into operons: clusters of open reading frames that are transcribed together in a single polycistronic transcript. However, polycistronic transcripts are rare in eukaryotes. One notable exception is that of miRNAs (microRNAs), small RNAs that regulate gene expression at the post-transcriptional level. The primary transcripts of miRNAs commonly produce more than one functional product, by at least three different mechanisms. miRNAs are often produced from polycistronic transcripts together with other miRNA precursors. Also, miRNAs frequently derive from protein-coding gene introns. Finally, each miRNA precursor can produce two mature miRNA products. We argue, in the present review, that miRNAs are frequently hosted in transcripts coding for multiple products because new miRNA precursor sequences that arise by chance in transcribed regions are more likely to become functional miRNAs during evolution.
Collapse
|
12
|
Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E. Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol 2013; 5:439-55. [PMID: 23348040 PMCID: PMC3590893 DOI: 10.1093/gbe/evt009] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes—such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements—act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death.
Collapse
Affiliation(s)
- Lothar Wissler
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | | | | | | | | |
Collapse
|
13
|
Xie C, Zhang YE, Chen JY, Liu CJ, Zhou WZ, Li Y, Zhang M, Zhang R, Wei L, Li CY. Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet 2012; 8:e1002942. [PMID: 23028352 PMCID: PMC3441637 DOI: 10.1371/journal.pgen.1002942] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Accepted: 07/24/2012] [Indexed: 01/08/2023] Open
Abstract
Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA–Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level. Ever since the pre-genomic era, people believed that “mother gene”-based mechanisms such as gene duplication were the major means of creating new genes. Recently, we and others reported several “motherless” protein-coding genes in human, challenging the conventional idea in that some protein-coding genes might have emerged de novo from ancestral non-coding DNAs. However, how these interesting proteins originated is a question that remained unaddressed. The ancestral non-coding DNA must become transcribed and gain a translatable open reading frame before becoming a protein-coding gene, but either order of these two steps is possible. Here, we performed a comparative transcriptome study in human, chimpanzee, and rhesus macaque to address these fundamental questions. We found that most of the hominoid-specific de novo protein-coding genes encoded long non-coding RNAs in rhesus macaque or chimpanzee, with similar transcript structure and correlated tissue expression profile, but the protein-coding genes often had higher transcriptional abundance. According to the rule of parsimony, we conclude that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes that are then further optimized at the transcriptional level, a pattern insensitive to the parameters used in the identification and analysis of de novo genes.
Collapse
Affiliation(s)
- Chen Xie
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
| | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chu-Jun Liu
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
| | - Ying Li
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Mao Zhang
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Rongli Zhang
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Liping Wei
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
- * E-mail: (C-YL); (LW)
| | - Chuan-Yun Li
- Institute of Molecular Medicine, Peking University, Beijing, China
- * E-mail: (C-YL); (LW)
| |
Collapse
|
14
|
Sabath N, Wagner A, Karlin D. Evolution of viral proteins originated de novo by overprinting. Mol Biol Evol 2012; 29:3767-80. [PMID: 22821011 PMCID: PMC3494269 DOI: 10.1093/molbev/mss179] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
New protein-coding genes can originate either through modification of existing genes or de novo. Recently, the importance of de novo origination has been recognized in eukaryotes, although eukaryotic genes originated de novo are relatively rare and difficult to identify. In contrast, viruses contain many de novo genes, namely those in which an existing gene has been “overprinted” by a new open reading frame, a process that generates a new protein-coding gene overlapping the ancestral gene. We analyzed the evolution of 12 experimentally validated viral genes that originated de novo and estimated their relative ages. We found that young de novo genes have a different codon usage from the rest of the genome. They evolve rapidly and are under positive or weak purifying selection. Thus, young de novo genes might have strain-specific functions, or no function, and would be difficult to detect using current genome annotation methods that rely on the sequence signature of purifying selection. In contrast to young de novo genes, older de novo genes have a codon usage that is similar to the rest of the genome. They evolve slowly and are under stronger purifying selection. Some of the oldest de novo genes evolve under stronger selection pressure than the ancestral gene they overlap, suggesting an evolutionary tug of war between the ancestral and the de novo gene.
Collapse
Affiliation(s)
- Niv Sabath
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
| | | | | |
Collapse
|
15
|
Kumar A, Bhandari A, Sinha R, Sardar P, Sushma M, Goyal P, Goswami C, Grapputo A. Molecular phylogeny of OVOL genes illustrates a conserved C2H2 zinc finger domain coupled by hypervariable unstructured regions. PLoS One 2012; 7:e39399. [PMID: 22737237 PMCID: PMC3380836 DOI: 10.1371/journal.pone.0039399] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2011] [Accepted: 05/23/2012] [Indexed: 11/19/2022] Open
Abstract
OVO-like proteins (OVOL) are members of the zinc finger protein family and serve as transcription factors to regulate gene expression in various differentiation processes. Recent studies have shown that OVOL genes are involved in epithelial development and differentiation in a wide variety of organisms; yet there is a lack of comprehensive studies that describe OVOL proteins from an evolutionary perspective. Using comparative genomic analysis, we traced three different OVOL genes (OVOL1-3) in vertebrates. One gene, OVOL3, was duplicated during a whole-genome-duplication event in fish, but only the copy (OVOL3b) was retained. From early-branching metazoa to humans, we found that a core domain, comprising a tetrad of C2H2 zinc fingers, is conserved. By domain comparison of the OVOL proteins, we found that they evolved in different metazoan lineages by attaching intrinsically-disordered (ID) segments of N/C-terminal extensions of 100 to 1000 amino acids to this conserved core. These ID regions originated independently across different animal lineages giving rise to different types of OVOL genes over the course of metazoan evolution. We illustrated the molecular evolution of metazoan OVOL genes over a period of 700 million years (MY). This study both extends our current understanding of the structure/function relationship of metazoan OVOL genes, and assembles a good platform for further characterization of OVOL genes from diverged organisms.
Collapse
Affiliation(s)
- Abhishek Kumar
- Department of Biology, University of Padua, Padova, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Mbanefo EC, Chuanxin Y, Kikuchi M, Shuaibu MN, Boamah D, Kirinoki M, Hayashi N, Chigusa Y, Osada Y, Hamano S, Hirayama K. Origin of a novel protein-coding gene family with similar signal sequence in Schistosoma japonicum. BMC Genomics 2012; 13:260. [PMID: 22716200 PMCID: PMC3434034 DOI: 10.1186/1471-2164-13-260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2012] [Accepted: 06/11/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Evolution of novel protein-coding genes is the bedrock of adaptive evolution. Recently, we identified six protein-coding genes with similar signal sequence from Schistosoma japonicum egg stage mRNA using signal sequence trap (SST). To find the mechanism underlying the origination of these genes with similar core promoter regions and signal sequence, we adopted an integrated approach utilizing whole genome, transcriptome and proteome database BLAST queries, other bioinformatics tools, and molecular analyses. RESULTS Our data, in combination with database analyses showed evidences of expression of these genes both at the mRNA and protein levels exclusively in all developmental stages of S. japonicum. The signal sequence motif was identified in 27 distinct S. japonicum UniGene entries with multiple mRNA transcripts, and in 34 genome contigs distributed within 18 scaffolds with evidence of genome-wide dispersion. No homolog of these genes or similar domain was found in deposited data from any other organism. We observed preponderance of flanking repetitive elements (REs), albeit partial copies, especially of the RTE-like and Perere class at either side of the duplication source locus. The role of REs as major mediators of DNA-level recombination leading to dispersive duplication is discussed with evidence from our analyses. We also identified a stepwise pathway towards functional selection in evolving genes by alternative splicing. Equally, the possible transcription models of some protein-coding representatives of the duplicons are presented with evidence of expression in vitro. CONCLUSION Our findings contribute to the accumulating evidence of the role of REs in the generation of evolutionary novelties in organisms' genomes.
Collapse
Affiliation(s)
- Evaristus Chibunna Mbanefo
- Department of Immunogenetics, Institute of Tropical Medicine (NEKKEN), and Global COE Program, Nagasaki University, 1-12-4 Sakamoto, 852-8523, Nagasaki, Japan
- Department of Parasitology and Entomology, Faculty of Bioscience, Nnamdi Azikiwe University, P.M.B. 5025, Awka, Nigeria
| | - Yu Chuanxin
- Laboratory on Technology for Parasitic Disease Prevention and Control, Jiangsu Institute of Parasitic Diseases, 117 Yangxiang, Meiyuan, Wuxi, 214064, People's Republic of China
| | - Mihoko Kikuchi
- Department of Immunogenetics, Institute of Tropical Medicine (NEKKEN), and Global COE Program, Nagasaki University, 1-12-4 Sakamoto, 852-8523, Nagasaki, Japan
| | - Mohammed Nasir Shuaibu
- Department of Immunogenetics, Institute of Tropical Medicine (NEKKEN), and Global COE Program, Nagasaki University, 1-12-4 Sakamoto, 852-8523, Nagasaki, Japan
| | - Daniel Boamah
- Department of Immunogenetics, Institute of Tropical Medicine (NEKKEN), and Global COE Program, Nagasaki University, 1-12-4 Sakamoto, 852-8523, Nagasaki, Japan
| | - Masashi Kirinoki
- Laboratory of Tropical Medicine and Parasitology, Dokkyo Medical University, Tochigi, Japan
| | - Naoko Hayashi
- Laboratory of Tropical Medicine and Parasitology, Dokkyo Medical University, Tochigi, Japan
| | - Yuichi Chigusa
- Laboratory of Tropical Medicine and Parasitology, Dokkyo Medical University, Tochigi, Japan
| | - Yoshio Osada
- Department of Immunology and Parasitology, The University of Occupational and Environmental Health, Kitakyushu, Japan
| | - Shinjiro Hamano
- Department of Parasitology, Institute of Tropical Medicine (NEKKEN), and Global COE Program, Nagasaki University, 1-12-4 Sakamoto, 852-8523, Nagasaki, Japan
| | - Kenji Hirayama
- Department of Immunogenetics, Institute of Tropical Medicine (NEKKEN), and Global COE Program, Nagasaki University, 1-12-4 Sakamoto, 852-8523, Nagasaki, Japan
| |
Collapse
|