1
|
Chen JH, Landback P, Arsala D, Guzzetta A, Xia S, Atlas J, Sosa D, Zhang YE, Cheng J, Shen B, Long M. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567139. [PMID: 38045239 PMCID: PMC10690195 DOI: 10.1101/2023.11.14.567139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are genetic novelties pivotal in mammalian evolution. However, their phenotypic impacts and evolutionary patterns over time remain elusive in humans due to the technical and ethical complexities of functional studies. Integrating gene age dating with Mendelian disease phenotyping, our research shows a gradual rise in disease gene proportion as gene age increases. Logistic regression modeling indicates that this increase in older genes may be related to their longer sequence lengths and higher burdens of deleterious de novo germline variants (DNVs). We also find a steady integration of new genes with biomedical phenotypes into the human genome over macroevolutionary timescales (~0.07% per million years). Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures across gene ages. Notably, young genes show significant enrichment in diseases related to the male reproductive system, indicating strong sexual selection. Young genes also exhibit disease-related functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, musculoskeletal phenotypes, and color vision. We further reveal a logistic growth pattern of pleiotropy over evolutionary time, indicating a diminishing marginal growth of new functions for older genes due to intensifying selective constraints over time. We propose a pleiotropy-barrier model that delineates higher potentials for phenotypic innovation in young genes compared to older genes, a process that is subject to natural selection. Our study demonstrates that evolutionarily new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
Affiliation(s)
- Jian-Hai Chen
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Patrick Landback
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Alexander Guzzetta
- Department of Pathology, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Jared Atlas
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingqiu Cheng
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| |
Collapse
|
2
|
Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, Structure, and Functional Space of Drosophila De Novo Proteins. Genome Biol Evol 2024; 16:evae176. [PMID: 39212966 PMCID: PMC11363682 DOI: 10.1093/gbe/evae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| |
Collapse
|
3
|
Rich A, Acar O, Carvunis AR. Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome. Genome Biol 2024; 25:183. [PMID: 38978079 PMCID: PMC11232214 DOI: 10.1186/s13059-024-03287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 05/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.
Collapse
Affiliation(s)
- April Rich
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Omer Acar
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
4
|
Tan Q, Xiao J, Chen J, Wang Y, Zhang Z, Zhao T, Li Y. ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers. Brief Bioinform 2024; 25:bbae225. [PMID: 38942594 PMCID: PMC11213619 DOI: 10.1093/bib/bbae225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/26/2024] [Accepted: 04/22/2024] [Indexed: 06/30/2024] Open
Abstract
Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.
Collapse
Affiliation(s)
- Qingxiong Tan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Jin Xiao
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China
| | - Jiayang Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yixuan Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zeliang Zhang
- Department of Computer Science, University of Rochester, Rochester, New York State, USA
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | | | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Nanshan, Shenzhen, China
| |
Collapse
|
5
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun 2024; 15:810. [PMID: 38280868 PMCID: PMC10821953 DOI: 10.1038/s41467-024-45028-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/09/2024] [Indexed: 01/29/2024] Open
Abstract
Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation. These young genes provide an opportunity to study the structural and functional origins of proteins. Here, we combine high-quality base-level whole-genome alignments and computational structural modeling to study the origination, evolution, and protein structures of lineage-specific de novo genes. We identify 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. Sequence composition, evolutionary rates, and expression patterns indicate possible gradual functional or adaptive shifts with their gene ages. Surprisingly, we find little overall protein structural changes in candidates from the Drosophilinae lineage. We identify several candidates with potentially well-folded protein structures. Ancestral sequence reconstruction analysis reveals that most potentially well-folded candidates are often born well-folded. Single-cell RNA-seq analysis in testis shows that although most de novo gene candidates are enriched in spermatocytes, several young candidates are biased towards the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and protein structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
6
|
Wacholder A, Carvunis AR. Biological factors and statistical limitations prevent detection of most noncanonical proteins by mass spectrometry. PLoS Biol 2023; 21:e3002409. [PMID: 38048358 PMCID: PMC10721188 DOI: 10.1371/journal.pbio.3002409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/14/2023] [Accepted: 10/30/2023] [Indexed: 12/06/2023] Open
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry (MS) experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here, we leveraged recent advances in ribosome profiling and MS to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly expressed to be detected by shotgun MS at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for 4 noncanonical proteins in MS data, which were also supported by evolution and translation data. These results illustrate the power of MS to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly expressed proteins.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
7
|
Chen J. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. RESEARCH SQUARE 2023:rs.3.rs-3632644. [PMID: 38045389 PMCID: PMC10690325 DOI: 10.21203/rs.3.rs-3632644/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are structural novelties pivotal in mammalian evolution. Their phenotypic impact on humans, however, remains elusive due to the technical and ethical complexities in functional studies. Through combining gene age dating with Mendelian disease phenotyping, our research reveals that new genes associated with disease phenotypes steadily integrate into the human genome at a rate of ~ 0.07% every million years over macroevolutionary timescales. Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures between young and old genes. Notably, young genes show significant enrichment in the male reproductive system, indicating strong sexual selection. Young genes also exhibit functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, bipedal locomotion, and color vision. Our findings further reveal increasing levels of pleiotropy over evolutionary time, which accompanies stronger selective constraints. We propose a "pleiotropy-barrier" model that delineates different potentials for phenotypic innovation between young and older genes subject to natural selection. Our study demonstrates that evolutionary new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
|
8
|
Wacholder A, Carvunis AR. Biological Factors and Statistical Limitations Prevent Detection of Most Noncanonical Proteins by Mass Spectrometry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531963. [PMID: 36945638 PMCID: PMC10028962 DOI: 10.1101/2023.03.09.531963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here we leveraged recent advances in ribosome profiling and mass spectrometry to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly-expressed to be detected by shotgun mass spectrometry at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for four noncanonical proteins in mass spectrometry data, which were also supported by evolution and translation data. These results illustrate the power of mass spectrometry to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly-expressed proteins.
Collapse
|
9
|
Lombardo KD, Sheehy HK, Cridland JM, Begun DJ. Identifying candidate de novo genes expressed in the somatic female reproductive tract of Drosophila melanogaster. G3 (BETHESDA, MD.) 2023; 13:jkad122. [PMID: 37259569 PMCID: PMC10411569 DOI: 10.1093/g3journal/jkad122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/02/2023]
Abstract
Most eukaryotic genes have been vertically transmitted to the present from distant ancestors. However, variable gene number across species indicates that gene gain and loss also occurs. While new genes typically originate as products of duplications and rearrangements of preexisting genes, putative de novo genes-genes born out of ancestrally nongenic sequence-have been identified. Previous studies of de novo genes in Drosophila have provided evidence that expression in male reproductive tissues is common. However, no studies have focused on female reproductive tissues. Here we begin addressing this gap in the literature by analyzing the transcriptomes of 3 female reproductive tract organs (spermatheca, seminal receptacle, and parovaria) in 3 species-our focal species, Drosophila melanogaster-and 2 closely related species, Drosophila simulans and Drosophila yakuba, with the goal of identifying putative D. melanogaster-specific de novo genes expressed in these tissues. We discovered several candidate genes, located in sequence annotated as intergenic. Consistent with the literature, these genes tend to be short, single exon, and lowly expressed. We also find evidence that some of these genes are expressed in other D. melanogaster tissues and both sexes. The relatively small number of intergenic candidate genes discovered here is similar to that observed in the accessory gland, but substantially fewer than that observed in the testis.
Collapse
Affiliation(s)
- Kaelina D Lombardo
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - Hayley K Sheehy
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - Julie M Cridland
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - David J Begun
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
10
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
11
|
Davati N, Ghorbani A. Discovery of long non-coding RNAs in Aspergillus flavus response to water activity, CO 2 concentration, and temperature changes. Sci Rep 2023; 13:10330. [PMID: 37365206 DOI: 10.1038/s41598-023-37236-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 06/19/2023] [Indexed: 06/28/2023] Open
Abstract
Although the role of long non-coding RNAs (lncRNAs) in key biological processes in animals and plants has been confirmed for decades, their identification in fungi remains limited. In this study, we discovered and characterized lncRNAs in Aspergillus flavus in response to changes in water activity, CO2 concentration, and temperature, and predicted their regulatory roles in cellular functions. A total of 472 lncRNAs were identified in the genome of A. flavus, consisting of 470 novel lncRNAs and 2 putative lncRNAs (EFT00053849670 and EFT00053849665). Our analysis of lncRNA expression revealed significant differential expression under stress conditions in A. flavus. Our findings indicate that lncRNAs in A. flavus, particularly down-regulated lncRNAs, may play pivotal regulatory roles in aflatoxin biosynthesis, respiratory activities, cellular survival, and metabolic maintenance under stress conditions. Additionally, we predicted that sense lncRNAs down-regulated by a temperature of 30 °C, osmotic stress, and CO2 concentration might indirectly regulate proline metabolism. Furthermore, subcellular localization analysis revealed that up-and down-regulated lncRNAs are frequently localized in the nucleus under stress conditions, particularly at a water activity of 0.91, while most up-regulated lncRNAs may be located in the cytoplasm under high CO2 concentration.
Collapse
Affiliation(s)
- Nafiseh Davati
- Department of Food Science and Technology, College of Food Industry, Bu-Ali Sina University, Hamedan, 65167-38695, Iran.
| | - Abozar Ghorbani
- Nuclear Agriculture Research School, Nuclear Science and Technology Research Institute (NSTRI), Karaj, Iran.
| |
Collapse
|
12
|
Fakhar AZ, Liu J, Pajerowska-Mukhtar KM, Mukhtar MS. The Lost and Found: Unraveling the Functions of Orphan Genes. J Dev Biol 2023; 11:27. [PMID: 37367481 PMCID: PMC10299390 DOI: 10.3390/jdb11020027] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/19/2023] [Accepted: 05/26/2023] [Indexed: 06/28/2023] Open
Abstract
Orphan Genes (OGs) are a mysterious class of genes that have recently gained significant attention. Despite lacking a clear evolutionary history, they are found in nearly all living organisms, from bacteria to humans, and they play important roles in diverse biological processes. The discovery of OGs was first made through comparative genomics followed by the identification of unique genes across different species. OGs tend to be more prevalent in species with larger genomes, such as plants and animals, and their evolutionary origins remain unclear but potentially arise from gene duplication, horizontal gene transfer (HGT), or de novo origination. Although their precise function is not well understood, OGs have been implicated in crucial biological processes such as development, metabolism, and stress responses. To better understand their significance, researchers are using a variety of approaches, including transcriptomics, functional genomics, and molecular biology. This review offers a comprehensive overview of the current knowledge of OGs in all domains of life, highlighting the possible role of dark transcriptomics in their evolution. More research is needed to fully comprehend the role of OGs in biology and their impact on various biological processes.
Collapse
Affiliation(s)
| | | | | | - M. Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294, USA
| |
Collapse
|
13
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
14
|
Lombardo KD, Sheehy HK, Cridland JM, Begun DJ. Identifying candidate de novo genes expressed in the somatic female reproductive tract of Drosophila melanogaster. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539262. [PMID: 37205537 PMCID: PMC10187257 DOI: 10.1101/2023.05.03.539262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Most eukaryotic genes have been vertically transmitted to the present from distant ancestors. However, variable gene number across species indicates that gene gain and loss also occurs. While new genes typically originate as products of duplications and rearrangements of pre-existing genes, putative de novo genes - genes born out of previously non-genic sequence - have been identified. Previous studies of de novo genes in Drosophila have provided evidence that expression in male reproductive tissues is common. However, no studies have focused on female reproductive tissues. Here we begin addressing this gap in the literature by analyzing the transcriptomes of three female reproductive tract organs (spermatheca, seminal receptacle, and parovaria) in three species - our focal species, D. melanogaster - and two closely related species, D. simulans and D. yakuba , with the goal of identifying putative D. melanogaster -specific de novo genes expressed in these tissues. We discovered several candidate genes, which, consistent with the literature, tend to be short, simple, and lowly expressed. We also find evidence that some of these genes are expressed in other D. melanogaster tissues and both sexes. The relatively small number of candidate genes discovered here is similar to that observed in the accessory gland, but substantially fewer than that observed in the testis.
Collapse
Affiliation(s)
- Kaelina D Lombardo
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - Hayley K Sheehy
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - Julie M Cridland
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis CA 95616
| |
Collapse
|
15
|
Chen N, Yang S, You D, Shen J, Ruan B, Wu M, Zhang J, Luo X, Tang H. Systematic genetic modifications of cell wall biosynthesis enhanced the secretion and surface-display of polysaccharide degrading enzymes in Saccharomyces cerevisiae. Metab Eng 2023; 77:273-282. [PMID: 37100192 DOI: 10.1016/j.ymben.2023.04.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 03/31/2023] [Accepted: 04/15/2023] [Indexed: 04/28/2023]
Abstract
Saccharomyces cerevisiae is a robust cell factory to secrete or surface-display cellulase and amylase for the conversion of agricultural residues into valuable chemicals. Engineering the secretory pathway is a well-known strategy for overproducing these enzymes. Although cell wall biosynthesis can be tightly linked to the secretory pathway by regulation of all involved processes, the effect of its modifications on protein production has not been extensively studied. In this study, we systematically studied the effect of engineering cell wall biosynthesis on the activity of cellulolytic enzyme β-glucosidase (BGL1) by comparing seventy-nine gene knockout S. cerevisiae strains and newly identified that inactivation of DFG5, YPK1, FYV5, CCW12 and KRE1 obviously improved BGL1 secretion and surface-display. Combinatorial modifications of these genes, particularly double deletion of FVY5 and CCW12, along with the use of rich medium, increased the activity of secreted and surface-displayed BGL1 by 6.13-fold and 7.99-fold, respectively. Additionally, we applied this strategy to improve the activity of the cellulolytic cellobiohydrolase and amylolytic α-amylase. Through proteomic analysis coupled with reverse engineering, we found that in addition to the secretory pathway, regulation of translation processes may also involve in improving enzyme activity by engineering cell wall biosynthesis. Our work provides new insight into the construction of a yeast cell factory for efficient production of polysaccharide degrading enzymes.
Collapse
Affiliation(s)
- Nanzhu Chen
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Shuo Yang
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; State Key Laboratory of Biobased Material and Green Papermaking, School of Bioengineering, Key Laboratory of Shandong Microbial Engineering, Qilu University of Technology, 3501 Daxue Road, Jinan, 250353, China
| | - Dawei You
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Junfeng Shen
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Banlai Ruan
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Mei Wu
- Synceres Biosciences (Shenzhen) Co., Ltd, Nanshan Medical Device Industrial Park, Nanhai Avenue, Shenzhen, 518067, China
| | - Jianzhi Zhang
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Xiaozhou Luo
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Hongting Tang
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
16
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
17
|
Parikh SB, Houghton C, Van Oss SB, Wacholder A, Carvunis A. Origins, evolution, and physiological implications of de novo genes in yeast. Yeast 2022; 39:471-481. [PMID: 35959631 PMCID: PMC9544372 DOI: 10.1002/yea.3810] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/03/2022] Open
Abstract
De novo gene birth is the process by which new genes emerge in sequences that were previously noncoding. Over the past decade, researchers have taken advantage of the power of yeast as a model and a tool to study the evolutionary mechanisms and physiological implications of de novo gene birth. We summarize the mechanisms that have been proposed to explicate how noncoding sequences can become protein-coding genes, highlighting the discovery of pervasive translation of the yeast transcriptome and its presumed impact on evolutionary innovation. We summarize current best practices for the identification and characterization of de novo genes. Crucially, we explain that the field is still in its nascency, with the physiological roles of most young yeast de novo genes identified thus far still utterly unknown. We hope this review inspires researchers to investigate the true contribution of de novo gene birth to cellular physiology and phenotypic diversity across yeast strains and species.
Collapse
Affiliation(s)
- Saurin B. Parikh
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - S. Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Anne‐Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| |
Collapse
|
18
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
19
|
Suenaga Y, Kato M, Nagai M, Nakatani K, Kogashi H, Kobatake M, Makino T. Open reading frame dominance indicates protein‐coding potential of RNAs. EMBO Rep 2022; 23:e54321. [PMID: 35438231 PMCID: PMC9171421 DOI: 10.15252/embr.202154321] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Recent studies have identified numerous RNAs with both coding and noncoding functions. However, the sequence characteristics that determine this bifunctionality remain largely unknown. In the present study, we develop and test the open reading frame (ORF) dominance score, which we define as the fraction of the longest ORF in the sum of all putative ORF lengths. This score correlates with translation efficiency in coding transcripts and with translation of noncoding RNAs. In bacteria and archaea, coding and noncoding transcripts have narrow distributions of high and low ORF dominance, respectively, whereas those of eukaryotes show relatively broader ORF dominance distributions, with considerable overlap between coding and noncoding transcripts. The extent of overlap positively and negatively correlates with the mutation rate of genomes and the effective population size of species, respectively. Tissue‐specific transcripts show higher ORF dominance than ubiquitously expressed transcripts, and the majority of tissue‐specific transcripts are expressed in mature testes. These data suggest that the decrease in population size and the emergence of testes in eukaryotic organisms allowed for the evolution of potentially bifunctional RNAs.
Collapse
Affiliation(s)
- Yusuke Suenaga
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
| | - Mamoru Kato
- Division of Bioinformatics National Cancer Centre Research Institute Tokyo Japan
| | - Momoko Nagai
- Division of Bioinformatics National Cancer Centre Research Institute Tokyo Japan
| | - Kazuma Nakatani
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
- Department of Molecular Biology and Oncology Chiba University School of Medicine Chiba Japan
- Innovative Medicine CHIBA Doctoral WISE Program Chiba University School of Medicine Chiba Japan
| | - Hiroyuki Kogashi
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
- Department of Molecular Biology and Oncology Chiba University School of Medicine Chiba Japan
| | - Miho Kobatake
- Department of Molecular Carcinogenesis Chiba Cancer Centre Research Institute Chiba Japan
| | - Takashi Makino
- Laboratory of Evolutionary Genomics Graduate School of Life Sciences Tohoku University Sendai Japan
| |
Collapse
|
20
|
Tanvir R, Ping W, Sun J, Cain M, Li X, Li L. AtQQS orphan gene and NtNF-YC4 boost protein accumulation and pest resistance in tobacco (Nicotiana tabacum). PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2022; 317:111198. [PMID: 35193747 DOI: 10.1016/j.plantsci.2022.111198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 12/07/2021] [Accepted: 01/26/2022] [Indexed: 05/19/2023]
Abstract
Qua-Quine Starch (QQS), an orphan gene exclusively found in Arabidopsis thaliana, interacts with Nuclear Factor Y subunit C4 (NF-YC4) and regulates carbon and nitrogen allocation in different plant species. Several studies uncovered its potential in increasing total protein and resistance against pathogens/pests in Arabidopsis and soybean. However, it is still unclear if these attributes QQS offers are universal in all flowering plants. Here we studied AtQQS and Nicotiana tabacum NF-YC4's (NtNF-YC4) influence on starch/protein content and pest resistance in tobacco. Our results showed both AtQQS and NtNF-YC4 had a positive impact on the plant's total protein accumulation. Simultaneously, we have also observed reduced starch biosynthesis and increased resistance against common pests like whiteflies (Bemisia tabaci) and aphids (Myzus persicae) in tobacco plants expressing AtQQS or overexpressing NtNF-YC4. Real-time PCR also revealed increased NF-YC4 expression after aphid infestation in tobacco varieties with higher pest resistance but decreased/unchanged NF-YC4 expression in varieties susceptible to pests. Further analysis revealed that QQS expression and overexpression of NtNF-YC4 strongly repressed expression of genes such as sugar transporter SWEET10 and Flowering Locus T (FT), suggesting involvement of SWEET10 and FT in the QQS and NF-YC4 mediated carbon and nitrogen allocation in tobacco. Our data suggested that the activity of species-specific orphan genes may not be limited to the original species or its close relatives. Sequence alignment revealed the conserved sequence of the NF-YC4s in different plant species that may be responsible for the resulting shift in metabolism, pest resistance. Cis-acting DNA element analysis of NtNF-YC4 promoter region may outline potential mechanisms for these phenotypic changes.
Collapse
Affiliation(s)
- Rezwan Tanvir
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Wenli Ping
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA; Institute of Tobacco, Henan Academy of Agricultural Sciences, Key Laboratory for Green Preservation & Control of Tobacco Diseases and Pests in Huanghuai Growing Area, Zhengzhou, Henan 450002, China
| | - Jiping Sun
- Institute of Tobacco, Henan Academy of Agricultural Sciences, Key Laboratory for Green Preservation & Control of Tobacco Diseases and Pests in Huanghuai Growing Area, Zhengzhou, Henan 450002, China
| | - Morgan Cain
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Xuejun Li
- Institute of Tobacco, Henan Academy of Agricultural Sciences, Key Laboratory for Green Preservation & Control of Tobacco Diseases and Pests in Huanghuai Growing Area, Zhengzhou, Henan 450002, China
| | - Ling Li
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA.
| |
Collapse
|
21
|
Montini N, Doughty TW, Domenzain I, Fenton DA, Baranov PV, Harrington R, Nielsen J, Siewers V, Morrissey JP. Identification of a novel gene required for competitive growth at high temperature in the thermotolerant yeast Kluyveromyces marxianus. MICROBIOLOGY (READING, ENGLAND) 2022; 168. [PMID: 35333706 PMCID: PMC9558357 DOI: 10.1099/mic.0.001148] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
It is important to understand the basis of thermotolerance in yeasts to broaden their application in industrial biotechnology. The capacity to run bioprocesses at temperatures above 40 °C is of great interest but this is beyond the growth range of most of the commonly used yeast species. In contrast, some industrial yeasts such as Kluyveromyces marxianus can grow at temperatures of 45 °C or higher. Such species are valuable for direct use in industrial biotechnology and as a vehicle to study the genetic and physiological basis of yeast thermotolerance. In previous work, we reported that evolutionarily young genes disproportionately changed expression when yeast were growing under stressful conditions and postulated that such genes could be important for long-term adaptation to stress. Here, we tested this hypothesis in K. marxianus by identifying and studying species-specific genes that showed increased expression during high-temperature growth. Twelve such genes were identified and 11 were successfully inactivated using CRISPR-mediated mutagenesis. One gene, KLMX_70384, is required for competitive growth at high temperature, supporting the hypothesis that evolutionary young genes could play roles in adaptation to harsh environments. KLMX_70384 is predicted to encode an 83 aa peptide, and RNA sequencing and ribo-sequencing were used to confirm transcription and translation of the gene. The precise function of KLMX_70384 remains unknown but some features are suggestive of RNA-binding activity. The gene is located in what was previously considered an intergenic region of the genome, which lacks homologues in other yeasts or in databases. Overall, the data support the hypothesis that genes that arose de novo in K. marxianus after the speciation event that separated K. marxianus and K. lactis contribute to some of its unique traits.
Collapse
Affiliation(s)
- Noemi Montini
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland
| | - Tyler W Doughty
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - Iván Domenzain
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - Darren A Fenton
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland.,School of Biochemistry and Cell Biology, University College Cork, Cork T12 K8AF, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork T12 K8AF, Ireland
| | - Ronan Harrington
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - Verena Siewers
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - John P Morrissey
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland
| |
Collapse
|
22
|
New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022; 13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open
Abstract
De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes’ properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5′ Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5′ UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5′ UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Collapse
|
23
|
Cridland JM, Majane AC, Zhao L, Begun DJ. Population biology of accessory gland-expressed de novo genes in Drosophila melanogaster. Genetics 2022; 220:iyab207. [PMID: 34791207 PMCID: PMC8733444 DOI: 10.1093/genetics/iyab207] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/08/2021] [Indexed: 12/20/2022] Open
Abstract
Early work on de novo gene discovery in Drosophila was consistent with the idea that many such genes have male-biased patterns of expression, including a large number expressed in the testis. However, there has been little formal analysis of variation in the abundance and properties of de novo genes expressed in different tissues. Here, we investigate the population biology of recently evolved de novo genes expressed in the Drosophila melanogaster accessory gland, a somatic male tissue that plays an important role in male and female fertility and the post mating response of females, using the same collection of inbred lines used previously to identify testis-expressed de novo genes, thus allowing for direct cross tissue comparisons of these genes in two tissues of male reproduction. Using RNA-seq data, we identify candidate de novo genes located in annotated intergenic and intronic sequence and determine the properties of these genes including chromosomal location, expression, abundance, and coding capacity. Generally, we find major differences between the tissues in terms of gene abundance and expression, though other properties such as transcript length and chromosomal distribution are more similar. We also explore differences between regulatory mechanisms of de novo genes in the two tissues and how such differences may interact with selection to produce differences in D. melanogaster de novo genes expressed in the two tissues.
Collapse
Affiliation(s)
- Julie M Cridland
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| | - Alex C Majane
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| |
Collapse
|
24
|
Li J, Singh U, Bhandary P, Campbell J, Arendsee Z, Seetharam AS, Wurtele ES. Foster thy young: enhanced prediction of orphan genes in assembled genomes. Nucleic Acids Res 2021; 50:e37. [PMID: 34928390 PMCID: PMC9023268 DOI: 10.1093/nar/gkab1238] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/22/2021] [Accepted: 12/02/2021] [Indexed: 02/06/2023] Open
Abstract
Proteins encoded by newly-emerged genes ('orphan genes') share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.
Collapse
Affiliation(s)
- Jing Li
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA
| | - Urminder Singh
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Priyanka Bhandary
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Jacqueline Campbell
- Corn Insects and Crop Genetics Research Unit, US Department of Agriculture Agriculture Research Service, Ames, IA 50014, USA
| | - Zebulun Arendsee
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Iowa State University, Ames, IA 50014, USA
| | - Eve Syrkin Wurtele
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| |
Collapse
|
25
|
Cherezov RO, Vorontsova JE, Simonova OB. The Phenomenon of Evolutionary “De Novo Generation” of Genes. Russ J Dev Biol 2021. [DOI: 10.1134/s1062360421060035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
26
|
Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel) 2021; 12:1913. [PMID: 34946861 PMCID: PMC8702183 DOI: 10.3390/genes12121913] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
We study the potential for the de novo evolution of genes from random nucleotide sequences using libraries of E. coli expressing random sequence peptides. We assess the effects of such peptides on cell growth by monitoring frequency changes in individual clones in a complex library through four serial passages. Using a new analysis pipeline that allows the tracing of peptides of all lengths, we find that over half of the peptides have consistent effects on cell growth. Across nine different experiments, around 16% of clones increase in frequency and 36% decrease, with some variation between individual experiments. Shorter peptides (8-20 residues), are more likely to increase in frequency, longer ones are more likely to decrease. GC content, amino acid composition, intrinsic disorder, and aggregation propensity show slightly different patterns between peptide groups. Sequences that increase in frequency tend to be more disordered with lower aggregation propensity. This coincides with the observation that young genes with more disordered structures are better tolerated in genomes. Our data indicate that random sequences can be a source of evolutionary innovation, since a large fraction of them are well tolerated by the cells or can provide a growth advantage.
Collapse
Affiliation(s)
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology, August-Thienemann Strasse 2, 24306 Plön, Germany;
| |
Collapse
|
27
|
Li J, Liu X, Yin Z, Hu Z, Zhang KQ. An Overview on Identification and Regulatory Mechanisms of Long Non-coding RNAs in Fungi. Front Microbiol 2021; 12:638617. [PMID: 33995298 PMCID: PMC8113380 DOI: 10.3389/fmicb.2021.638617] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 04/06/2021] [Indexed: 01/04/2023] Open
Abstract
For decades, more and more long non-coding RNAs (lncRNAs) have been confirmed to play important functions in key biological processes of different organisms. At present, most identified lncRNAs and those with known functional roles are from mammalian systems. However, lncRNAs have also been found in primitive eukaryotic fungi, and they have different functions in fungal development, metabolism, and pathogenicity. In this review, we highlight some recent researches on lncRNAs in the primitive eukaryotic fungi, particularly focusing on the identification of lncRNAs and their regulatory roles in diverse biological processes.
Collapse
Affiliation(s)
- Juan Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Xiaoying Liu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Ziyu Yin
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Zhihong Hu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Ke-Qin Zhang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| |
Collapse
|
28
|
Uncovering de novo gene birth in yeast using deep transcriptomics. Nat Commun 2021; 12:604. [PMID: 33504782 PMCID: PMC7841160 DOI: 10.1038/s41467-021-20911-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 01/04/2021] [Indexed: 01/30/2023] Open
Abstract
De novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes.
Collapse
|
29
|
The new chimeric chiron genes evolved essential roles in zebrafish embryonic development by regulating NAD + levels. SCIENCE CHINA-LIFE SCIENCES 2021; 64:1929-1948. [PMID: 33521859 DOI: 10.1007/s11427-020-1851-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 11/16/2020] [Indexed: 10/22/2022]
Abstract
The origination of new genes is important for generating genetic novelties for adaptive evolution and biological diversity. However, their potential roles in embryonic development, evolutionary processes into ancient networks, and contributions to adaptive evolution remain poorly investigated. Here, we identified a novel chimeric gene family, the chiron family, and explored its genetic basis and functional evolution underlying the adaptive evolution of Danioninae fishes. The ancestral chiron gene originated through retroposition of nampt in Danioninae 48-54 million years ago (Mya) and expanded into five duplicates (chiron1-5) in zebrafish 1-4 Mya. The chiron genes (chirons) likely originated in embryonic development and gradually extended their expression in the testis. Functional experiments showed that chirons were essential for zebrafish embryo development. By integrating into the NAD+ synthesis pathway, chirons could directly catalyze the NAD+ rate-limiting reaction and probably impact two energy metabolism genes (nmnat1 and naprt) to be under positive selection in Danioninae fishes. Together, these results mainly demonstrated that the origin of new chimeric chiron genes may be involved in adaptive evolution by integrating and impacting the NAD+ biosynthetic pathway. This coevolution may contribute to the physiological adaptation of Danioninae fishes to widespread and varied biomes in Southeast Asian.
Collapse
|
30
|
Ruiz-Orera J, Villanueva-Cañas JL, Albà MM. Evolution of new proteins from translated sORFs in long non-coding RNAs. Exp Cell Res 2020; 391:111940. [PMID: 32156600 DOI: 10.1016/j.yexcr.2020.111940] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 02/26/2020] [Accepted: 03/02/2020] [Indexed: 01/07/2023]
Abstract
High throughput RNA sequencing techniques have revealed that a large fraction of the genome is transcribed into long non-coding RNAs (lncRNAs). Unlike canonical protein-coding genes, lncRNAs do not contain long open reading frames (ORFs) and tend to be poorly conserved across species. However, many of them contain small ORFs (sORFs) that exhibit translation signatures according to ribosome profiling or proteomics data. These sORFs are a source of putative novel proteins; some of them may confer a selective advantage and be maintained over time, a process known as de novo gene birth. Here we review the mechanisms by which randomly occurring sORFs in lncRNAs can become new functional proteins.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | | | - M Mar Albà
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Hospital Del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain; Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, 08010, Spain.
| |
Collapse
|
31
|
Vakirlis N, Acar O, Hsu B, Castilho Coelho N, Van Oss SB, Wacholder A, Medetgul-Ernar K, Bowman RW, Hines CP, Iannotta J, Parikh SB, McLysaght A, Camacho CJ, O'Donnell AF, Ideker T, Carvunis AR. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat Commun 2020; 11:781. [PMID: 32034123 PMCID: PMC7005711 DOI: 10.1038/s41467-020-14500-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 12/20/2019] [Indexed: 11/14/2022] Open
Abstract
Recent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection. There is increasing evidence that protein-coding genes can emerge de novo from noncoding genomic regions. Vakirlis et al. propose that sequences encoding transmembrane polypeptides can emerge de novo in thymine-rich genomic regions and provide organisms with fitness benefits.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Brian Hsu
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - S Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Kate Medetgul-Ernar
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Ray W Bowman
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States
| | - Cameron P Hines
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - John Iannotta
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Carlos J Camacho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Allyson F O'Donnell
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States.
| | - Trey Ideker
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States.
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.
| |
Collapse
|
32
|
Zhang Z, Fan Y, Xiong J, Guo X, Hu K, Wang Z, Gao J, Wen J, Yi B, Shen J, Ma C, Fu T, Xia S, Tu J. Two young genes reshape a novel interaction network in Brassica napus. THE NEW PHYTOLOGIST 2020; 225:530-545. [PMID: 31407340 DOI: 10.1111/nph.16113] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 08/06/2019] [Indexed: 06/10/2023]
Abstract
New genes often drive the evolution of gene interaction networks. In Brassica napus, the widely used genic male sterile breeding system 7365ABC is controlled by two young genes, Bnams4b and BnaMs3. However, the interaction mechanism of these two young genes remains unclear. Here, we confirmed that Bnams4b interacts with the nuclear localised E3 ligase BRUTUS (BTS). Ectopic expression of AtBRUTUS (AtBTS) and comparison between Bnams4b -transgenic Arabidopsis and bts mutants suggested that Bnams4b may drive translocation of BTS to cause various toxic defects. BnaMs3 gained an exclusive interaction with the plastid outer-membrane translocon Toc33 compared with Bnams3 and AtTic40, and specifically compensated for the toxic effects of Bnams4b . Heat shock treatment also rescued the sterile phenotype, and high temperature suppressed the interaction between Bnams4b and BTS in yeast. Furthermore, the ubiquitin system and TOC (translocon at the outer envelope membrane of chloroplasts) component accumulation were affected in Bnams4b -transgenic Arabidopsis plants. Taken together, these results indicate that new chimeric Bnams4b carries BTS from nucleus to chloroplast, which may disrupt the normal ubiquitin-proteasome system to cause toxic effects, and these defects can be compensated by BnaMs3-Toc33 interaction or environmental heat shock. It reveals a scenario in which two population-specific coevolved young genes reshape a novel interaction network in plants.
Collapse
Affiliation(s)
- Zhiqiang Zhang
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yu Fan
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jie Xiong
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiang Guo
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Kaining Hu
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhixin Wang
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jie Gao
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jing Wen
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Bin Yi
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinxiong Shen
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Chaozhi Ma
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Tingdong Fu
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Shengqian Xia
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinxing Tu
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
33
|
Huang L, Liu H, Wu J, Zhao R, Li Y, Melaku G, Zhang S, Huang G, Bao Y, Ning M, Chen B, Gong Y, Hu Q, Zhang J, Zhang Y. Evolution of Plant Architecture in Oryza Driven by the PROG1 Locus. FRONTIERS IN PLANT SCIENCE 2020; 11:876. [PMID: 32655603 PMCID: PMC7325765 DOI: 10.3389/fpls.2020.00876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 05/28/2020] [Indexed: 05/20/2023]
Abstract
The genetic control of plant architecture in crops is critical for agriculture and understanding morphological evolution. This study showed that an open reading frame (ORF) of the rice domestication gene PROG1 appeared 3.4-3.9 million years ago (Mya). Subsequently, it acquired a novel protein-coding gene function in the genome of O. rufipogon (~0.3-0.4 Mya). This extremely young gene and its paralogous C2H2 genes located nearby define the prostrate architecture of O. rufipogon and, thus, are of adaptive significance for wild rice in swamp and water areas. However, selection for dense planting and high yield during rice domestication silenced the PROG1 gene and caused the loss of the RPAD locus containing functional C2H2 paralogs; hence, domesticated lines exhibit an erect plant architecture. Analysis of the stepwise origination process of PROG1 and its evolutionary genetics revealed that this zinc-finger coding gene may have rapidly evolved under positive selection and promoted the transition from non- or semi-prostrate growth to prostrate growth. A transgenic assay showed that PROG1 from O. rufipogon exerts a stronger function compared with PROG1 sequences from other Oryza species. However, the analysis of the expression levels of PROG1 in different Oryza species suggests that the transcriptional regulation of PROG1 has played an important role in its evolution. This study provides the first strong case showing how a fundamental morphological trait evolved in Oryza species driven by a gene locus.
Collapse
Affiliation(s)
- Liyu Huang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
- *Correspondence: Liyu Huang,
| | - Hui Liu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Junjie Wu
- College of Agriculture and Biology Science, Dali University, Dali, China
| | - Ruoping Zhao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | | | - Getachew Melaku
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
- Agricultural Biotechnology Directorate of the Ethiopian Biotechnology Institute, Addis Ababa, Ethiopia
| | - Shilai Zhang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Guangfu Huang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Yachong Bao
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Min Ning
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Benjia Chen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Yurui Gong
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Qingyi Hu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Jing Zhang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, Kunming, China
| | - Yesheng Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- BGI-Baoshan, Baoshan, China
- Yesheng Zhang,
| |
Collapse
|
34
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Tautz D. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife 2019; 8:44392. [PMID: 31436535 PMCID: PMC6760900 DOI: 10.7554/elife.44392] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 08/21/2019] [Indexed: 12/16/2022] Open
Abstract
The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation. Different species have specific genes that set them apart from other species. Yet exactly how these species-specific genes originate is not fully known. The traditional view is that existing old genes are duplicated to make a ‘spare’ copy, which can change through mutations into a new gene with a new role gradually over time. Despite there being lots of evidence supporting this theory, not all new genes found in recent years can be traced back to older genes. This led to an alternative view – that recently evolved genes can also appear ‘de novo’, and come from regions of random DNA sequences that did not previously code for a protein. So far, the possibility of genes forming de novo during evolution has largely been supported by comparing and analyzing the genomes of related species. However, very little is known about the biological role these de novo genes play. Now, Xie et al. have generated a list of recently evolved de novo mouse genes, and carried out a detailed analysis of one de novo gene expressed in females at the time when embryos implant into the uterus wall. To study the role of this gene, Xie et al. created a strain of knock-out mice that have a defunct version of the protein coded by the gene. Loss of this protein caused female mice to have their second litter after a shorter period of time and increased the likelihood that female mice would terminate their newborn pups. This suggests that this newly discovered de novo gene is involved in regulating the female reproductive cycles of mice. Further analysis showed that this de novo gene counteracts the action of an older gene that promotes the implantation of embryos. This gene has therefore likely evolved due to the benefit it offers mothers, as it protects them from experiencing the increased physiological stress caused by a premature second pregnancy. These findings support the idea that genes which have evolved de novo can have an essential biological purpose despite coming from random DNA sequences. This establishes that de novo evolution of genes is the second major mechanism of how new genes with significant biological roles can form in the genome.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian Karsten Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
35
|
Abstract
The origin of novel genes and beneficial functions is of fundamental interest in evolutionary biology. New genes can originate from different mechanisms, including horizontal gene transfer, duplication-divergence, and de novo from noncoding DNA sequences. Comparative genomics has generated strong evidence for de novo emergence of genes in various organisms, but experimental demonstration of this process has been limited to localized randomization in preexisting structural scaffolds. This bypasses the basic requirement of de novo gene emergence, i.e., lack of an ancestral gene. We constructed highly diverse plasmid libraries encoding randomly generated open reading frames and expressed them in Escherichia coli to identify short peptides that could confer a beneficial and selectable phenotype in vivo (in a living cell). Selections on antibiotic-containing agar plates resulted in the identification of three peptides that increased aminoglycoside resistance up to 48-fold. Combining genetic and functional analyses, we show that the peptides are highly hydrophobic, and by inserting into the membrane, they reduce membrane potential, decrease aminoglycoside uptake, and thereby confer high-level resistance. This study demonstrates that randomized DNA sequences can encode peptides that confer selective benefits and illustrates how expression of random sequences could spark the origination of new genes. In addition, our results also show that this question can be addressed experimentally by expression of highly diverse sequence libraries and subsequent selection for specific functions, such as resistance to toxic compounds, the ability to rescue auxotrophic/temperature-sensitive mutants, and growth on normally nonused carbon sources, allowing the exploration of many different phenotypes.IMPORTANCE De novo gene origination from nonfunctional DNA sequences was long assumed to be implausible. However, recent studies have shown that large fractions of genomic noncoding DNA are transcribed and translated, potentially generating new genes. Experimental validation of this process so far has been limited to comparative genomics, in vitro selections, or partial randomizations. Here, we describe selection of novel peptides in vivo using fully random synthetic expression libraries. The peptides confer aminoglycoside resistance by inserting into the bacterial membrane and thereby partly reducing membrane potential and decreasing drug uptake. Our results show that beneficial peptides can be selected from random sequence pools in vivo and support the idea that expression of noncoding sequences could spark the origination of new genes.
Collapse
|
36
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
37
|
Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I. A Molecular Portrait of De Novo Genes in Yeasts. Mol Biol Evol 2019; 35:631-645. [PMID: 29220506 DOI: 10.1093/molbev/msx315] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Alex S Hebert
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | - Dana A Opulente
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Guillaume Achaz
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,SMILE Group, CIRB UMR7241, Collège de France, Paris, France
| | - Chris Todd Hittinger
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Gilles Fischer
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Joshua J Coon
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI.,Department of Chemistry, University of Wisconsin-Madison, Madison, WI.,Morgridge Institute for Research, Madison, WI
| | - Ingrid Lafontaine
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Physico-Chimique, Physiologie Membranaire et Moléculaire du Chloroplaste UMR7141, 75005 Paris, France
| |
Collapse
|
38
|
Abstract
De novo genes, that is, protein-coding genes originating from previously noncoding sequence, have gone from being considered impossibly unlikely to being recognized as an important source of genetic novelty in eukaryotic genomes. It is clear that de novo gene evolution is a rare but consistent feature of eukaryotic genomes, being detected in every genome studied. However, different studies often use different computational methods, and the numbers and identities of the detected genes vary greatly. Here we present a coherent protocol for the computational identification of de novo genes by comparative genomics. The method described uses homology searches, identification of syntenic regions, and ancestral sequence reconstruction to produce high-confidence candidates with robust evidence of de novo emergence. It is designed to be easily applicable given the basic knowledge of bioinformatic tools and scalable so that it can be applied on large and small datasets.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Department of Genetics, Trinity College Dublin, Smurfit Institute of Genetics, University of Dublin, Dublin, Ireland.
| | - Aoife McLysaght
- Department of Genetics, Trinity College Dublin, Smurfit Institute of Genetics, University of Dublin, Dublin, Ireland
| |
Collapse
|
39
|
Translation of Small Open Reading Frames: Roles in Regulation and Evolutionary Innovation. Trends Genet 2018; 35:186-198. [PMID: 30606460 DOI: 10.1016/j.tig.2018.12.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023]
Abstract
The translatome can be defined as the sum of the RNA sequences that are translated into proteins in the cell by the ribosomal machinery. Until recently, it was generally assumed that the translatome was essentially restricted to evolutionary conserved proteins encoded by the set of annotated protein-coding genes. However, it has become increasingly clear that it also includes small regulatory open reading frames (ORFs), functional micropeptides, de novo proteins, and the pervasive translation of likely nonfunctional proteins. Many of these ORFs have been discovered thanks to the development of ribosome profiling, a technique to sequence ribosome-protected RNA fragments. To fully capture the diversity of translated ORFs, we propose a comprehensive classification that includes the new types of translated ORFs in addition to standard proteins.
Collapse
|
40
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
41
|
Lu TC, Leu JY, Lin WC. A Comprehensive Analysis of Transcript-Supported De Novo Genes in Saccharomyces sensu stricto Yeasts. Mol Biol Evol 2018; 34:2823-2838. [PMID: 28981695 PMCID: PMC5850716 DOI: 10.1093/molbev/msx210] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340 S. cerevisiae-specific de novo genes and 8,871 S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.
Collapse
Affiliation(s)
- Tzu-Chiao Lu
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jun-Yi Leu
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Wen-Chang Lin
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
42
|
Bhandary P, Seetharam AS, Arendsee ZW, Hur M, Wurtele ES. Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2018; 267:32-47. [PMID: 29362097 DOI: 10.1016/j.plantsci.2017.10.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/07/2017] [Accepted: 10/15/2017] [Indexed: 05/19/2023]
Abstract
More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system.
Collapse
Affiliation(s)
- Priyanka Bhandary
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA 50011, USA
| | - Zebulun W Arendsee
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Manhoi Hur
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Eve Syrkin Wurtele
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
43
|
Shao J, Chen H, Yang D, Jiang M, Zhang H, Wu B, Li J, Yuan L, Liu C. Genome-wide Identification and Characterization of Natural Antisense Transcripts by Strand-specific RNA Sequencing in Ganoderma lucidum. Sci Rep 2017; 7:5711. [PMID: 28720793 PMCID: PMC5515960 DOI: 10.1038/s41598-017-04303-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 05/12/2017] [Indexed: 12/13/2022] Open
Abstract
Ganoderma lucidum is a white-rot fungus best-known for its medicinal and ligninolytic activities. To discover the underlying genes responsible for these activities, we identified and characterized the natural antisense transcripts (NATs) using strand-specific (ss) RNA-seq data obtained from the mycelia, primordia and fruiting bodies. NATs were identified using a custom pipeline and then subjected to functional enrichment and differential expression analyses. A total of 1613 cis- and 244 trans- sense and antisense transcripts were identified. Mapping to GO terms and KEGG pathways revealed that NATs were frequently associated with genes of particular functional categories in particular stages. ssRT-qPCR experiments showed that the expression profiles of 30 of 50 (60%) transcripts were highly correlated with those of the RNA-seq results (r ≥ 0.9). Expression profiles of 22 of 25 (88%) pairs of NATs and STs were highly correlated (p ≤ 0.01), with 15 having r ≥ 0.8 and 4 having r ≤ -0.8. Six lignin-modifying genes and their NATs were analyzed in detail. Diverse patterns of differential expression among different stages and positive and negative correlations were observed. These results suggested that NATs were implicated in gene expression regulation in a function-group and developmental-stage specific manner through complex mechanisms.
Collapse
Affiliation(s)
- Junjie Shao
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Haimei Chen
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Dan Yang
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Mei Jiang
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Hui Zhang
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Bin Wu
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Jianqin Li
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Lichai Yuan
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China
| | - Chang Liu
- Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine from Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100193, P.R. China.
| |
Collapse
|
44
|
Dujon BA, Louis EJ. Genome Diversity and Evolution in the Budding Yeasts (Saccharomycotina). Genetics 2017; 206:717-750. [PMID: 28592505 PMCID: PMC5499181 DOI: 10.1534/genetics.116.199216] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 04/03/2017] [Indexed: 12/15/2022] Open
Abstract
Considerable progress in our understanding of yeast genomes and their evolution has been made over the last decade with the sequencing, analysis, and comparisons of numerous species, strains, or isolates of diverse origins. The role played by yeasts in natural environments as well as in artificial manufactures, combined with the importance of some species as model experimental systems sustained this effort. At the same time, their enormous evolutionary diversity (there are yeast species in every subphylum of Dikarya) sparked curiosity but necessitated further efforts to obtain appropriate reference genomes. Today, yeast genomes have been very informative about basic mechanisms of evolution, speciation, hybridization, domestication, as well as about the molecular machineries underlying them. They are also irreplaceable to investigate in detail the complex relationship between genotypes and phenotypes with both theoretical and practical implications. This review examines these questions at two distinct levels offered by the broad evolutionary range of yeasts: inside the best-studied Saccharomyces species complex, and across the entire and diversified subphylum of Saccharomycotina. While obviously revealing evolutionary histories at different scales, data converge to a remarkably coherent picture in which one can estimate the relative importance of intrinsic genome dynamics, including gene birth and loss, vs. horizontal genetic accidents in the making of populations. The facility with which novel yeast genomes can now be studied, combined with the already numerous available reference genomes, offer privileged perspectives to further examine these fundamental biological questions using yeasts both as eukaryotic models and as fungi of practical importance.
Collapse
Affiliation(s)
- Bernard A Dujon
- Department Genomes and Genetics, Institut Pasteur, Centre National de la Recherche Scientifique UMR3525, 75724-CEDEX15 Paris, France
- Université Pierre et Marie Curie UFR927, 75005 Paris, France
| | - Edward J Louis
- Centre for Genetic Architecture of Complex Traits, University of Leicester, LE1 7RH, United Kingdom
- Department of Genetics, University of Leicester, LE1 7RH, United Kingdom
| |
Collapse
|
45
|
Thompson DA, Cubillos FA. Natural gene expression variation studies in yeast. Yeast 2016; 34:3-17. [PMID: 27668700 DOI: 10.1002/yea.3210] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 09/16/2016] [Accepted: 09/18/2016] [Indexed: 11/06/2022] Open
Abstract
The rise of sequence information across different yeast species and strains is driving an increasing number of studies in the emerging field of genomics to associate polymorphic variants, mRNA abundance and phenotypic differences between individuals. Here, we gathered evidence from recent studies covering several layers that define the genotype-phenotype gap, such as mRNA abundance, allele-specific expression and translation efficiency to demonstrate how genetic variants co-evolve and define an individual's genome. Moreover, we exposed several antecedents where inter- and intra-specific studies led to opposite conclusions, probably owing to genetic divergence. Future studies in this area will benefit from the access to a massive array of well-annotated genomes and new sequencing technologies, which will allow the fine breakdown of the complex layers that delineate the genotype-phenotype map. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
| | - Francisco A Cubillos
- Centro de Estudios en Ciencia y Tecnología de Alimentos, Universidad de Santiago de Chile, Santiago, Chile.,Millennium Nucleus for Fungal Integrative and Synthetic Biology.,Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile, Santiago, Chile
| |
Collapse
|
46
|
Multi-step formation, evolution, and functionalization of new cytoplasmic male sterility genes in the plant mitochondrial genomes. Cell Res 2016; 27:130-146. [PMID: 27725674 DOI: 10.1038/cr.2016.115] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Revised: 08/04/2016] [Accepted: 09/01/2016] [Indexed: 01/28/2023] Open
Abstract
New gene origination is a major source of genomic innovations that confer phenotypic changes and biological diversity. Generation of new mitochondrial genes in plants may cause cytoplasmic male sterility (CMS), which can promote outcrossing and increase fitness. However, how mitochondrial genes originate and evolve in structure and function remains unclear. The rice Wild Abortive type of CMS is conferred by the mitochondrial gene WA352c (previously named WA352) and has been widely exploited in hybrid rice breeding. Here, we reconstruct the evolutionary trajectory of WA352c by the identification and analyses of 11 mitochondrial genomic recombinant structures related to WA352c in wild and cultivated rice. We deduce that these structures arose through multiple rearrangements among conserved mitochondrial sequences in the mitochondrial genome of the wild rice Oryza rufipogon, coupled with substoichiometric shifting and sequence variation. We identify two expressed but nonfunctional protogenes among these structures, and show that they could evolve into functional CMS genes via sequence variations that could relieve the self-inhibitory potential of the proteins. These sequence changes would endow the proteins the ability to interact with the nucleus-encoded mitochondrial protein COX11, resulting in premature programmed cell death in the anther tapetum and male sterility. Furthermore, we show that the sequences that encode the COX11-interaction domains in these WA352c-related genes have experienced purifying selection during evolution. We propose a model for the formation and evolution of new CMS genes via a "multi-recombination/protogene formation/functionalization" mechanism involving gradual variations in the structure, sequence, copy number, and function.
Collapse
|
47
|
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet 2016; 17:567-78. [PMID: 27452112 DOI: 10.1038/nrg.2016.78] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.
Collapse
Affiliation(s)
- Aoife McLysaght
- The Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
| |
Collapse
|
48
|
McLysaght A, Guerzoni D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140332. [PMID: 26323763 PMCID: PMC4571571 DOI: 10.1098/rstb.2014.0332] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations.
Collapse
Affiliation(s)
- Aoife McLysaght
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| | - Daniele Guerzoni
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| |
Collapse
|
49
|
Guerzoni D, McLysaght A. De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting. Genome Biol Evol 2016; 8:1222-32. [PMID: 27056411 PMCID: PMC4860702 DOI: 10.1093/gbe/evw074] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
De novo protein-coding gene origination is increasingly recognized as an important evolutionary mechanism. However, there remains a large amount of uncertainty regarding the frequency of these events and the mechanisms and speed of gene establishment. Here, we describe a rigorous search for cases of de novo gene origination in the great apes. We analyzed annotated proteomes as well as full genomic DNA and transcriptional and translational evidence. It is notable that results vary between database updates due to the fluctuating annotation of these genes. Nonetheless we identified 35 de novo genes: 16 human-specific; 5 human and chimpanzee specific; and 14 that originated prior to the divergence of human, chimpanzee, and gorilla and are found in all three genomes. The taxonomically restricted distribution of these genes cannot be explained by loss in other lineages. Each gene is supported by an open reading frame-creating mutation that occurred within the primate lineage, and which is not polymorphic in any species. Similarly to previous studies we find that the de novo genes identified are short and frequently located near pre-existing genes. Also, they may be associated with Alu elements and prior transcription and RNA-splicing at the locus. Additionally, we report the first case of apparent independent lineage sorting of a de novo gene. The gene is present in human and gorilla, whereas chimpanzee has the ancestral noncoding sequence. This indicates a long period of polymorphism prior to fixation and thus supports a model where de novo genes may, at least initially, have a neutral effect on fitness.
Collapse
Affiliation(s)
- Daniele Guerzoni
- Smurfit Institute of Genetics, Department of Genetics, Trinity College Dublin, University of Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Department of Genetics, Trinity College Dublin, University of Dublin, Ireland
| |
Collapse
|
50
|
Digianantonio KM, Hecht MH. A protein constructed de novo enables cell growth by altering gene regulation. Proc Natl Acad Sci U S A 2016; 113:2400-5. [PMID: 26884172 PMCID: PMC4780649 DOI: 10.1073/pnas.1600566113] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Recent advances in protein design rely on rational and computational approaches to create novel sequences that fold and function. In contrast, natural systems selected functional proteins without any design a priori. In an attempt to mimic nature, we used large libraries of novel sequences and selected for functional proteins that rescue Escherichia coli cells in which a conditionally essential gene has been deleted. In this way, the de novo protein SynSerB3 was selected as a rescuer of cells in which serB, which encodes phosphoserine phosphatase, an enzyme essential for serine biosynthesis, was deleted. However, SynSerB3 does not rescue the deleted activity by catalyzing hydrolysis of phosphoserine. Instead, SynSerB3 up-regulates hisB, a gene encoding histidinol phosphate phosphatase. This endogenous E. coli phosphatase has promiscuous activity that, when overexpressed, compensates for the deletion of phosphoserine phosphatase. Thus, the de novo protein SynSerB3 rescues the deletion of serB by altering the natural regulation of the His operon.
Collapse
Affiliation(s)
| | - Michael H Hecht
- Department of Chemistry, Princeton University, Princeton, NJ 08540
| |
Collapse
|