51
|
Lombardo KD, Sheehy HK, Cridland JM, Begun DJ. Identifying candidate de novo genes expressed in the somatic female reproductive tract of Drosophila melanogaster. G3 (BETHESDA, MD.) 2023; 13:jkad122. [PMID: 37259569 PMCID: PMC10411569 DOI: 10.1093/g3journal/jkad122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/02/2023]
Abstract
Most eukaryotic genes have been vertically transmitted to the present from distant ancestors. However, variable gene number across species indicates that gene gain and loss also occurs. While new genes typically originate as products of duplications and rearrangements of preexisting genes, putative de novo genes-genes born out of ancestrally nongenic sequence-have been identified. Previous studies of de novo genes in Drosophila have provided evidence that expression in male reproductive tissues is common. However, no studies have focused on female reproductive tissues. Here we begin addressing this gap in the literature by analyzing the transcriptomes of 3 female reproductive tract organs (spermatheca, seminal receptacle, and parovaria) in 3 species-our focal species, Drosophila melanogaster-and 2 closely related species, Drosophila simulans and Drosophila yakuba, with the goal of identifying putative D. melanogaster-specific de novo genes expressed in these tissues. We discovered several candidate genes, located in sequence annotated as intergenic. Consistent with the literature, these genes tend to be short, single exon, and lowly expressed. We also find evidence that some of these genes are expressed in other D. melanogaster tissues and both sexes. The relatively small number of intergenic candidate genes discovered here is similar to that observed in the accessory gland, but substantially fewer than that observed in the testis.
Collapse
Affiliation(s)
- Kaelina D Lombardo
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - Hayley K Sheehy
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - Julie M Cridland
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| | - David J Begun
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
52
|
Yocca AE, Platts A, Alger E, Teresi S, Mengist MF, Benevenuto J, Ferrão LFV, Jacobs M, Babinski M, Magallanes-Lundback M, Bayer P, Golicz A, Humann JL, Main D, Espley RV, Chagné D, Albert NW, Montanari S, Vorsa N, Polashock J, Díaz-Garcia L, Zalapa J, Bassil NV, Munoz PR, Iorizzo M, Edger PP. Blueberry and cranberry pangenomes as a resource for future genetic studies and breeding efforts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551392. [PMID: 37577683 PMCID: PMC10418200 DOI: 10.1101/2023.07.31.551392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Domestication of cranberry and blueberry began in the United States in the early 1800s and 1900s, respectively, and in part owing to their flavors and health-promoting benefits are now cultivated and consumed worldwide. The industry continues to face a wide variety of production challenges (e.g. disease pressures) as well as a demand for higher-yielding cultivars with improved fruit quality characteristics. Unfortunately, molecular tools to help guide breeding efforts for these species have been relatively limited compared with those for other high-value crops. Here, we describe the construction and analysis of the first pangenome for both blueberry and cranberry. Our analysis of these pangenomes revealed both crops exhibit great genetic diversity, including the presence-absence variation of 48.4% genes in highbush blueberry and 47.0% genes in cranberry. Auxiliary genes, those not shared by all cultivars, are significantly enriched with molecular functions associated with disease resistance and the biosynthesis of specialized metabolites, including compounds previously associated with improving fruit quality traits. The discovery of thousands of genes, not present in the previous reference genomes for blueberry and cranberry, will serve as the basis of future research and as potential targets for future breeding efforts. The pangenome, as a multiple-sequence alignment, as well as individual annotated genomes, are publicly available for analysis on the Genome Database for Vaccinium - a curated and integrated web-based relational database. Lastly, the core-gene predictions from the pangenomes will serve useful to develop a community genotyping platform to guide future molecular breeding efforts across the family.
Collapse
Affiliation(s)
- Alan E. Yocca
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Adrian Platts
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Elizabeth Alger
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
| | - Scott Teresi
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Molla F. Mengist
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
| | - Juliana Benevenuto
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Luis Felipe V. Ferrão
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - MacKenzie Jacobs
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Michal Babinski
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
| | | | - Philipp Bayer
- University of Western Australia, Perth 6009 Australia
| | | | - Jodi L Humann
- Department of Horticulture, Washington State University, Pullman, WA, 99163, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, 99163, USA
| | - Richard V. Espley
- The New Zealand Institute for Plant and Food Research Limited (PFR), Auckland, New Zealand
| | - David Chagné
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston, New Zealand
| | - Nick W. Albert
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston, New Zealand
| | - Sara Montanari
- The New Zealand Institute for Plant and Food Research Limited (PFR), Motueka, New Zealand
| | - Nicholi Vorsa
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019 USA
| | - James Polashock
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019 USA
| | - Luis Díaz-Garcia
- USDA-ARS, VCRU, Department of Horticulture, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Juan Zalapa
- USDA-ARS, VCRU, Department of Horticulture, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Nahla V. Bassil
- USDA-ARS, National Clonal Germplasm Repository, Corvallis, OR 97333, USA
| | - Patricio R. Munoz
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Massimo Iorizzo
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
- Department of Horticulture, North Carolina State University, Kannapolis, NC USA
| | - Patrick P. Edger
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences, Michigan State University, East Lansing, MI, 48824, USA
- MSU AgBioResearch, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
53
|
Athanasouli M, Akduman N, Röseler W, Theam P, Rödelsperger C. Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota. PLoS Genet 2023; 19:e1010832. [PMID: 37399201 DOI: 10.1371/journal.pgen.1010832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/15/2023] [Indexed: 07/05/2023] Open
Abstract
Adaptation of organisms to environmental change may be facilitated by the creation of new genes. New genes without homologs in other lineages are known as taxonomically-restricted orphan genes and may result from divergence or de novo formation. Previously, we have extensively characterized the evolution and origin of such orphan genes in the nematode model organism Pristionchus pacificus. Here, we employ large-scale transcriptomics to establish potential functional associations and to measure the degree of transcriptional plasticity among orphan genes. Specifically, we analyzed 24 RNA-seq samples from adult P. pacificus worms raised on 24 different monoxenic bacterial cultures. Based on coexpression analysis, we identified 28 large modules that harbor 3,727 diplogastrid-specific orphan genes and that respond dynamically to different bacteria. These coexpression modules have distinct regulatory architecture and also exhibit differential expression patterns across development suggesting a link between bacterial response networks and development. Phylostratigraphy revealed a considerably high number of family- and even species-specific orphan genes in certain coexpression modules. This suggests that new genes are not attached randomly to existing cellular networks and that integration can happen very fast. Integrative analysis of protein domains, gene expression and ortholog data facilitated the assignments of biological labels for 22 coexpression modules with one of the largest, fast-evolving module being associated with spermatogenesis. In summary, this work presents the first functional annotation for thousands of P. pacificus orphan genes and reveals insights into their integration into environmentally responsive gene networks.
Collapse
Affiliation(s)
- Marina Athanasouli
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Nermin Akduman
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Waltraud Röseler
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Penghieng Theam
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Christian Rödelsperger
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
54
|
Wang YH, Liu PZ, Liu H, Zhang RR, Liang Y, Xu ZS, Li XJ, Luo Q, Tan GF, Wang GL, Xiong AS. Telomere-to-telomere carrot ( Daucus carota) genome assembly reveals carotenoid characteristics. HORTICULTURE RESEARCH 2023; 10:uhad103. [PMID: 37786729 PMCID: PMC10541555 DOI: 10.1093/hr/uhad103] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 05/08/2023] [Indexed: 10/04/2023]
Abstract
Carrot (Daucus carota) is an Apiaceae plant with multi-colored fleshy roots that provides a model system for carotenoid research. In this study, we assembled a 430.40 Mb high-quality gapless genome to the telomere-to-telomere (T2T) level of "Kurodagosun" carrot. In total, 36 268 genes were identified and 34 961 of them were functionally annotated. The proportion of repeat sequences in the genome was 55.3%, mainly long terminal repeats. Depending on the coverage of the repeats, 14 telomeres and 9 centromeric regions on the chromosomes were predicted. A phylogenetic analysis showed that carrots evolved early in the family Apiaceae. Based on the T2T genome, we reconstructed the carotenoid metabolic pathway and identified the structural genes that regulate carotenoid biosynthesis. Among the 65 genes that were screened, 9 were newly identified. Additionally, some gene sequences overlapped with transposons, suggesting replication and functional differentiation of carotenoid-related genes during carrot evolution. Given that some gene copies were barely expressed during development, they might be functionally redundant. Comparison of 24 cytochrome P450 genes associated with carotenoid biosynthesis revealed the tandem or proximal duplication resulting in expansion of CYP gene family. These results provided molecular information for carrot carotenoid accumulation and contributed to a new genetic resource.
Collapse
Affiliation(s)
- Ya-Hui Wang
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Ministry of Agriculture and Rural Affairs Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Pei-Zhuo Liu
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Ministry of Agriculture and Rural Affairs Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Hui Liu
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Ministry of Agriculture and Rural Affairs Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Rong-Rong Zhang
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Ministry of Agriculture and Rural Affairs Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yi Liang
- Beijing Vegetable Research Center, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops in North China, Beijing 100097, China
| | - Zhi-Sheng Xu
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Ministry of Agriculture and Rural Affairs Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Xiao-Jie Li
- Beijing Vegetable Research Center, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops in North China, Beijing 100097, China
| | - Qing Luo
- Institute of Horticulture, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou 550025, China
| | - Guo-Fei Tan
- Institute of Horticulture, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou 550025, China
| | - Guang-Long Wang
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huaian, Jiangsu 223003, China
| | - Ai-Sheng Xiong
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Ministry of Agriculture and Rural Affairs Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| |
Collapse
|
55
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
56
|
Mohsen JJ, Slavoff SA. Noncoding translation: Quality control in the BAG. Mol Cell 2023; 83:1967-1969. [PMID: 37327774 DOI: 10.1016/j.molcel.2023.05.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 05/25/2023] [Accepted: 05/25/2023] [Indexed: 06/18/2023]
Abstract
Translation of noncoding regions is ubiquitous and upregulated in disease. Kesner et al.1 elucidate the mechanism by which the BAG6 complex exerts quality control over noncoding translation while targeting stable, noncanonical polypeptides to cellular membranes.
Collapse
Affiliation(s)
- Jessica J Mohsen
- Department of Chemistry, Yale University, New Haven, CT, USA; Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA; Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
| |
Collapse
|
57
|
Fakhar AZ, Liu J, Pajerowska-Mukhtar KM, Mukhtar MS. The Lost and Found: Unraveling the Functions of Orphan Genes. J Dev Biol 2023; 11:27. [PMID: 37367481 PMCID: PMC10299390 DOI: 10.3390/jdb11020027] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/19/2023] [Accepted: 05/26/2023] [Indexed: 06/28/2023] Open
Abstract
Orphan Genes (OGs) are a mysterious class of genes that have recently gained significant attention. Despite lacking a clear evolutionary history, they are found in nearly all living organisms, from bacteria to humans, and they play important roles in diverse biological processes. The discovery of OGs was first made through comparative genomics followed by the identification of unique genes across different species. OGs tend to be more prevalent in species with larger genomes, such as plants and animals, and their evolutionary origins remain unclear but potentially arise from gene duplication, horizontal gene transfer (HGT), or de novo origination. Although their precise function is not well understood, OGs have been implicated in crucial biological processes such as development, metabolism, and stress responses. To better understand their significance, researchers are using a variety of approaches, including transcriptomics, functional genomics, and molecular biology. This review offers a comprehensive overview of the current knowledge of OGs in all domains of life, highlighting the possible role of dark transcriptomics in their evolution. More research is needed to fully comprehend the role of OGs in biology and their impact on various biological processes.
Collapse
Affiliation(s)
| | | | | | - M. Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294, USA
| |
Collapse
|
58
|
Grandchamp A, Kühl L, Lebherz M, Brüggemann K, Parsch J, Bornberg-Bauer E. Population genomics reveals mechanisms and dynamics of de novo expressed open reading frame emergence in Drosophila melanogaster. Genome Res 2023; 33:872-890. [PMID: 37442576 PMCID: PMC10519401 DOI: 10.1101/gr.277482.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 06/06/2023] [Indexed: 07/15/2023]
Abstract
Novel genes are essential for evolutionary innovations and differ substantially even between closely related species. Recently, multiple studies across many taxa showed that some novel genes arise de novo, that is, from previously noncoding DNA. To characterize the underlying mutations that allowed de novo gene emergence and their order of occurrence, homologous regions must be detected within noncoding sequences in closely related sister genomes. So far, most studies do not detect noncoding homologs of de novo genes because of incomplete assemblies and annotations, and long evolutionary distances separating genomes. Here, we overcome these issues by searching for de novo expressed open reading frames (neORFs), the not-yet fixed precursors of de novo genes that emerged within a single species. We sequenced and assembled genomes with long-read technology and the corresponding transcriptomes from inbred lines of Drosophila melanogaster, derived from seven geographically diverse populations. We found line-specific neORFs in abundance but few neORFs shared by lines, suggesting a rapid turnover. Gain and loss of transcription is more frequent than the creation of ORFs, for example, by forming new start and stop codons. Consequently, the gain of ORFs becomes rate limiting and is frequently the initial step in neORFs emergence. Furthermore, transposable elements (TEs) are major drivers for intragenomic duplications of neORFs, yet TE insertions are less important for the emergence of neORFs. However, highly mutable genomic regions around TEs provide new features that enable gene birth. In conclusion, neORFs have a high birth-death rate, are rapidly purged, but surviving neORFs spread neutrally through populations and within genomes.
Collapse
Affiliation(s)
- Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany;
| | - Lucas Kühl
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Marie Lebherz
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Kathrin Brüggemann
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - John Parsch
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Munich, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
- Max Planck Institute for Biology Tübingen, Department of Protein Evolution, 72076 Tübingen, Germany
| |
Collapse
|
59
|
Sanejouand YH. On the Unknown Proteins of Eukaryotic Proteomes. J Mol Evol 2023:10.1007/s00239-023-10116-1. [PMID: 37219573 DOI: 10.1007/s00239-023-10116-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/07/2023] [Indexed: 05/24/2023]
Abstract
To study unknown proteins on a large scale, a reference system has been set up for the three better studied eukaryotic kingdoms, built with 36 proteomes as taxonomically diverse as possible. Proteins from 362 other eukaryotic proteomes with no known homologue in this set were then analyzed, focusing noteworthy on singletons, that is, on such proteins with no known homologue in their own proteome. Consistently, for a given species, no more than 12% of the singletons thus found are known at the protein level, according to Uniprot. In addition, since they rely on the information found in the alignment of homologous sequences, predictions of AlphaFold2 for their tridimensional structure are poor. In the case of metazoan species, the number of singletons rarely exceeds 1000 for the species the closest to the reference system (divergence times below 75 Myr). Interestingly, in the cases of viridiplantae and fungi, larger amounts of singletons are found for such species, as if the timescale on which singletons are added to proteomes were different in metazoa and in other eukaryotic kingdoms. In order to confirm this phenomenon, further studies of proteomes closer to those of the reference system are, however, needed.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, rue de la Houssinière, 44322, Nantes, France.
| |
Collapse
|
60
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
61
|
Lombardo KD, Sheehy HK, Cridland JM, Begun DJ. Identifying candidate de novo genes expressed in the somatic female reproductive tract of Drosophila melanogaster. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539262. [PMID: 37205537 PMCID: PMC10187257 DOI: 10.1101/2023.05.03.539262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Most eukaryotic genes have been vertically transmitted to the present from distant ancestors. However, variable gene number across species indicates that gene gain and loss also occurs. While new genes typically originate as products of duplications and rearrangements of pre-existing genes, putative de novo genes - genes born out of previously non-genic sequence - have been identified. Previous studies of de novo genes in Drosophila have provided evidence that expression in male reproductive tissues is common. However, no studies have focused on female reproductive tissues. Here we begin addressing this gap in the literature by analyzing the transcriptomes of three female reproductive tract organs (spermatheca, seminal receptacle, and parovaria) in three species - our focal species, D. melanogaster - and two closely related species, D. simulans and D. yakuba , with the goal of identifying putative D. melanogaster -specific de novo genes expressed in these tissues. We discovered several candidate genes, which, consistent with the literature, tend to be short, simple, and lowly expressed. We also find evidence that some of these genes are expressed in other D. melanogaster tissues and both sexes. The relatively small number of candidate genes discovered here is similar to that observed in the accessory gland, but substantially fewer than that observed in the testis.
Collapse
Affiliation(s)
- Kaelina D Lombardo
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - Hayley K Sheehy
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - Julie M Cridland
- Department of Evolution and Ecology, University of California, Davis CA 95616
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis CA 95616
| |
Collapse
|
62
|
Kesner JS, Chen Z, Shi P, Aparicio AO, Murphy MR, Guo Y, Trehan A, Lipponen JE, Recinos Y, Myeku N, Wu X. Noncoding translation mitigation. Nature 2023; 617:395-402. [PMID: 37046090 PMCID: PMC10560126 DOI: 10.1038/s41586-023-05946-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 03/13/2023] [Indexed: 04/14/2023]
Abstract
Translation is pervasive outside of canonical coding regions, occurring in long noncoding RNAs, canonical untranslated regions and introns1-4, especially in ageing4-6, neurodegeneration5,7 and cancer8-10. Notably, the majority of tumour-specific antigens are results of noncoding translation11-13. Although the resulting polypeptides are often nonfunctional, translation of noncoding regions is nonetheless necessary for the birth of new coding sequences14,15. The mechanisms underlying the surveillance of translation in diverse noncoding regions and how escaped polypeptides evolve new functions remain unclear10,16-19. Functional polypeptides derived from annotated noncoding sequences often localize to membranes20,21. Here we integrate massively parallel analyses of more than 10,000 human genomic sequences and millions of random sequences with genome-wide CRISPR screens, accompanied by in-depth genetic and biochemical characterizations. Our results show that the intrinsic nucleotide bias in the noncoding genome and in the genetic code frequently results in polypeptides with a hydrophobic C-terminal tail, which is captured by the ribosome-associated BAG6 membrane protein triage complex for either proteasomal degradation or membrane targeting. By contrast, canonical proteins have evolved to deplete C-terminal hydrophobic residues. Our results reveal a fail-safe mechanism for the surveillance of unwanted translation from diverse noncoding regions and suggest a possible biochemical route for the preferential membrane localization of newly evolved proteins.
Collapse
Affiliation(s)
- Jordan S Kesner
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Ziheng Chen
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Peiguo Shi
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Alexis O Aparicio
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Michael R Murphy
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Yang Guo
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Aditi Trehan
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Jessica E Lipponen
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Yocelyn Recinos
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Natura Myeku
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Xuebing Wu
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
63
|
Jain N, Richter F, Adzhubei I, Sharp AJ, Gelb BD. Small open reading frames: a comparative genetics approach to validation. BMC Genomics 2023; 24:226. [PMID: 37127568 PMCID: PMC10152738 DOI: 10.1186/s12864-023-09311-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 04/13/2023] [Indexed: 05/03/2023] Open
Abstract
Open reading frames (ORFs) with fewer than 100 codons are generally not annotated in genomes, although bona fide genes of that size are known. Newer biochemical studies have suggested that thousands of small protein-coding ORFs (smORFs) may exist in the human genome, but the true number and the biological significance of the micropeptides they encode remain uncertain. Here, we used a comparative genomics approach to identify high-confidence smORFs that are likely protein-coding. We identified 3,326 high-confidence smORFs using constraint within human populations and evolutionary conservation as additional lines of evidence. Next, we validated that, as a group, our high-confidence smORFs are conserved at the amino-acid level rather than merely residing in highly conserved non-coding regions. Finally, we found that high-confidence smORFs are enriched among disease-associated variants from GWAS. Overall, our results highlight that smORF-encoded peptides likely have important functional roles in human disease.
Collapse
Affiliation(s)
- Niyati Jain
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, New York, NY, 10029, USA
- Present Address: Committee On Genetics, Genomics, and Systems Biology, The University of Chicago, Chicago, IL, USA
| | - Felix Richter
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ivan Adzhubei
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, New York, NY, 10029, USA
| | - Bruce D Gelb
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, New York, NY, 10029, USA.
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
64
|
Liu J, Yuan R, Shao W, Wang J, Silman I, Sussman JL. Do "Newly Born" orphan proteins resemble "Never Born" proteins? A study using three deep learning algorithms. Proteins 2023. [PMID: 37092778 DOI: 10.1002/prot.26496] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 02/26/2023] [Accepted: 04/01/2023] [Indexed: 04/25/2023]
Abstract
"Newly Born" proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such "Newly Born" proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called "Never Born" proteins. The programs were used to compare the structures of two sets of "Never Born" proteins that had been expressed-Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high-quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well-identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high-quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3.
Collapse
Affiliation(s)
- Jing Liu
- Department of Biotechnology and Food Engineering, Guangdong Technion-Israel Institute of Technology, Shantou, China
- Faculty of Biotechnology and Food Engineering, Technion-Israel Institute of Technology, Haifa, Israel
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing, China
| | - Wei Shao
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jitong Wang
- Department of Chemistry, Tsinghua University, Beijing, China
| | - Israel Silman
- Department of Brain Sciences, The Weizmann Institute of Science, Rehovot, Israel
| | - Joel L Sussman
- Department of Chemical and Structural Biology, The Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
65
|
Sumner S, Favreau E, Geist K, Toth AL, Rehan SM. Molecular patterns and processes in evolving sociality: lessons from insects. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220076. [PMID: 36802779 PMCID: PMC9939270 DOI: 10.1098/rstb.2022.0076] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/16/2022] [Indexed: 02/21/2023] Open
Abstract
Social insects have provided some of the clearest insights into the origins and evolution of collective behaviour. Over 20 years ago, Maynard Smith and Szathmáry defined the most complex form of insect social behaviour-superorganismality-among the eight major transitions in evolution that explain the emergence of biological complexity. However, the mechanistic processes underlying the transition from solitary life to superorganismal living in insects remain rather elusive. An overlooked question is whether this major transition arose via incremental or step-wise modes of evolution. We suggest that examination of the molecular processes underpinning different levels of social complexity represented across the major transition from solitary to complex sociality can help address this question. We present a framework for using molecular data to assess to what extent the mechanistic processes that take place in the major transition to complex sociality and superorganismality involve nonlinear (implying step-wise evolution) or linear (implying incremental evolution) changes in the underlying molecular mechanisms. We assess the evidence for these two modes using data from social insects and discuss how this framework can be used to test the generality of molecular patterns and processes across other major transitions. This article is part of a discussion meeting issue 'Collective behaviour through time'.
Collapse
Affiliation(s)
- Seirian Sumner
- Centre for Biodiversity and Environmental Research, Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Emeline Favreau
- Centre for Biodiversity and Environmental Research, Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Katherine Geist
- Department of Ecology, Evolution and Organismal Biology, and Department of Entomology, Iowa State University, Ames, IA 50011, USA
| | - Amy L. Toth
- Department of Ecology, Evolution and Organismal Biology, and Department of Entomology, Iowa State University, Ames, IA 50011, USA
| | - Sandra M. Rehan
- Department of Biology, York University, Toronto, Canada M3J 1P3
| |
Collapse
|
66
|
Iyengar BR, Bornberg-Bauer E. Neutral Models of De Novo Gene Emergence Suggest that Gene Evolution has a Preferred Trajectory. Mol Biol Evol 2023; 40:msad079. [PMID: 37011142 PMCID: PMC10118301 DOI: 10.1093/molbev/msad079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 03/01/2023] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
New protein coding genes can emerge from genomic regions that previously did not contain any genes, via a process called de novo gene emergence. To synthesize a protein, DNA must be transcribed as well as translated. Both processes need certain DNA sequence features. Stable transcription requires promoters and a polyadenylation signal, while translation requires at least an open reading frame. We develop mathematical models based on mutation probabilities, and the assumption of neutral evolution, to find out how quickly genes emerge and are lost. We also investigate the effect of the order by which DNA features evolve, and if sequence composition is biased by mutation rate. We rationalize how genes are lost much more rapidly than they emerge, and how they preferentially arise in regions that are already transcribed. Our study not only answers some fundamental questions on the topic of de novo emergence but also provides a modeling framework for future studies.
Collapse
Affiliation(s)
- Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| |
Collapse
|
67
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
68
|
Wang L, Tonsager AJ, Zheng W, Wang Y, Stessman D, Fang W, Stenback KE, Campbell A, Tanvir R, Zhang J, Cothron S, Wan D, Meng Y, Spalding MH, Nikolau BJ, Li L. Single-cell genetic models to evaluate orphan gene function: The case of QQS regulating carbon and nitrogen allocation. FRONTIERS IN PLANT SCIENCE 2023; 14:1126139. [PMID: 37051080 PMCID: PMC10084940 DOI: 10.3389/fpls.2023.1126139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
We demonstrate two synthetic single-cell systems that can be used to better understand how the acquisition of an orphan gene can affect complex phenotypes. The Arabidopsis orphan gene, Qua-Quine Starch (QQS) has been identified as a regulator of carbon (C) and nitrogen (N) partitioning across multiple plant species. QQS modulates this important biotechnological trait by replacing NF-YB (Nuclear Factor Y, subunit B) in its interaction with NF-YC. In this study, we expand on these prior findings by developing Chlamydomonas reinhardtii and Saccharomyces cerevisiae strains, to refactor the functional interactions between QQS and NF-Y subunits to affect modulations in C and N allocation. Expression of QQS in C. reinhardtii modulates C (i.e., starch) and N (i.e., protein) allocation by affecting interactions between NF-YC and NF-YB subunits. Studies in S. cerevisiae revealed similar functional interactions between QQS and the NF-YC homolog (HAP5), modulating C (i.e., glycogen) and N (i.e., protein) allocation. However, in S. cerevisiae both the NF-YA (HAP2) and NF-YB (HAP3) homologs appear to have redundant functions to enable QQS and HAP5 to affect C and N allocation. The genetically tractable systems that developed herein exhibit the plasticity to modulate highly complex phenotypes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, United States
| | - Andrew J. Tonsager
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Engineering Research Center for Biorenewable Chemicals, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Wenguang Zheng
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Yingjun Wang
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Dan Stessman
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Wei Fang
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Kenna E. Stenback
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Engineering Research Center for Biorenewable Chemicals, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Alexis Campbell
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Engineering Research Center for Biorenewable Chemicals, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Rezwan Tanvir
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, United States
| | - Jinjiang Zhang
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, United States
- Mississippi School for Mathematics and Science, Columbus, MS, United States
| | - Samuel Cothron
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, United States
| | - Dongli Wan
- Institute of Grassland Research, Chinese Academy of Agricultural Sciences, Hohhot, China
| | - Yan Meng
- Department of Agriculture, Alcorn State University, Lorman, MS, United States
| | - Martin H. Spalding
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Basil J. Nikolau
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Engineering Research Center for Biorenewable Chemicals, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Ling Li
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, United States
| |
Collapse
|
69
|
Barrera-Redondo J, Lotharukpong JS, Drost HG, Coelho SM. Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biol 2023; 24:54. [PMID: 36964572 PMCID: PMC10037820 DOI: 10.1186/s13059-023-02895-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 03/10/2023] [Indexed: 03/26/2023] Open
Abstract
We present GenEra ( https://github.com/josuebarrera/GenEra ), a DIAMOND-fueled gene-family founder inference framework that addresses previously raised limitations and biases in genomic phylostratigraphy, such as homology detection failure. GenEra also reduces computational time from several months to a few days for any genome of interest. We analyze the emergence of taxonomically restricted gene families during major evolutionary transitions in plants, animals, and fungi. Our results indicate that the impact of homology detection failure on inferred patterns of gene emergence is lineage-dependent, suggesting that plants are more prone to evolve novelty through the emergence of new genes compared to animals and fungi.
Collapse
Affiliation(s)
- Josué Barrera-Redondo
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Jaruwatana Sodai Lotharukpong
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Department of Molecular Biology, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Susana M Coelho
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| |
Collapse
|
70
|
Sandmann CL, Schulz JF, Ruiz-Orera J, Kirchner M, Ziehm M, Adami E, Marczenke M, Christ A, Liebe N, Greiner J, Schoenenberger A, Muecke MB, Liang N, Moritz RL, Sun Z, Deutsch EW, Gotthardt M, Mudge JM, Prensner JR, Willnow TE, Mertins P, van Heesch S, Hubner N. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol Cell 2023; 83:994-1011.e18. [PMID: 36806354 PMCID: PMC10032668 DOI: 10.1016/j.molcel.2023.01.023] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/12/2022] [Accepted: 01/25/2023] [Indexed: 02/19/2023]
Abstract
All species continuously evolve short open reading frames (sORFs) that can be templated for protein synthesis and may provide raw materials for evolutionary adaptation. We analyzed the evolutionary origins of 7,264 recently cataloged human sORFs and found that most were evolutionarily young and had emerged de novo. We additionally identified 221 previously missed sORFs potentially translated into peptides of up to 15 amino acids-all of which are smaller than the smallest human microprotein annotated to date. To investigate the bioactivity of sORF-encoded small peptides and young microproteins, we subjected 266 candidates to a mass-spectrometry-based interactome screen with motif resolution. Based on these interactomes and additional cellular assays, we can associate several candidates with mRNA splicing, translational regulation, and endocytosis. Our work provides insights into the evolutionary origins and interaction potential of young and small proteins, thereby helping to elucidate this underexplored territory of the human proteome.
Collapse
Affiliation(s)
- Clara-L Sandmann
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jana F Schulz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jorge Ruiz-Orera
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Marieluise Kirchner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Matthias Ziehm
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Eleonora Adami
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Maike Marczenke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Annabel Christ
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Nina Liebe
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Johannes Greiner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Aaron Schoenenberger
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michael B Muecke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Ning Liang
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Michael Gotthardt
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John R Prensner
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Pediatric Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Thomas E Willnow
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark
| | - Philipp Mertins
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | | | - Norbert Hubner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany.
| |
Collapse
|
71
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
72
|
Luria V, Ma S, Shibata M, Pattabiraman K, Sestan N. Molecular and cellular mechanisms of human cortical connectivity. Curr Opin Neurobiol 2023; 80:102699. [PMID: 36921362 DOI: 10.1016/j.conb.2023.102699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 02/05/2023] [Indexed: 03/18/2023]
Abstract
Comparative studies of the cerebral cortex have identified various human and primate-specific changes in both local and long-range connectivity, which are thought to underlie our advanced cognitive capabilities. These changes are likely mediated by the divergence of spatiotemporal regulation of gene expression, which is particularly prominent in the prenatal and early postnatal human and non-human primate cerebral cortex. In this review, we describe recent advances in characterizing human and primate genetic and cellular innovations including identification of novel species-specific, especially human-specific, genes, gene expression patterns, and cell types. Finally, we highlight three recent studies linking these molecular changes to reorganization of cortical connectivity.
Collapse
Affiliation(s)
- Victor Luria
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Shaojie Ma
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Mikihito Shibata
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Kartik Pattabiraman
- Yale Child Study Center, Yale School of Medicine, New Haven, CT, 06510, USA.
| | - Nenad Sestan
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06510, USA; Yale Child Study Center, Yale School of Medicine, New Haven, CT, 06510, USA; Departments of Psychiatry, Genetics and Comparative Medicine, Program in Cellular Neuroscience, Neurodegeneration and Repair, and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT, 06510, USA.
| |
Collapse
|
73
|
Karlowski WM, Varshney D, Zielezinski A. Taxonomically Restricted Genes in Bacillus may Form Clusters of Homologs and Can be Traced to a Large Reservoir of Noncoding Sequences. Genome Biol Evol 2023; 15:7039703. [PMID: 36790099 PMCID: PMC10003748 DOI: 10.1093/gbe/evad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/09/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Taxonomically restricted genes (TRGs) are unique for a defined group of organisms and may act as potential genetic determinants of lineage-specific, biological properties. Here, we explore the TRGs of highly diverse and economically important Bacillus bacteria by examining commonly used TRG identification parameters and data sources. We show the significant effects of sequence similarity thresholds, composition, and the size of the reference database in the identification process. Subsequently, we applied stringent TRG search parameters and expanded the identification procedure by incorporating an analysis of noncoding and non-syntenic regions of non-Bacillus genomes. A multiplex annotation procedure minimized the number of false-positive TRG predictions and showed nearly one-third of the alleged TRGs could be mapped to genes missed in genome annotations. We traced the putative origin of TRGs by identifying homologous, noncoding genomic regions in non-Bacillus species and detected sequence changes that could transform these regions into protein-coding genes. In addition, our analysis indicated that Bacillus TRGs represent a specific group of genes mostly showing intermediate sequence properties between genes that are conserved across multiple taxa and nonannotated peptides encoded by open reading frames.
Collapse
Affiliation(s)
- Wojciech M Karlowski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Deepti Varshney
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Andrzej Zielezinski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| |
Collapse
|
74
|
Deng S. The origin of genetic and metabolic systems: Evolutionary structuralinsights. Heliyon 2023; 9:e14466. [PMID: 36967965 PMCID: PMC10036676 DOI: 10.1016/j.heliyon.2023.e14466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 02/27/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023] Open
Abstract
DNA is derived from reverse transcription and its origin is related to reverse transcriptase, DNA polymerase and integrase. The gene structure originated from the evolution of the first RNA polymerase. Thus, an explanation of the origin of the genetic system must also explain the evolution of these enzymes. This paper proposes a polymer structure model, termed the stable complex evolution model, which explains the evolution of enzymes and functional molecules. Enzymes evolved their functions by forming locally tightly packed complexes with specific substrates. A metabolic reaction can therefore be considered to be the result of adaptive evolution in this way when a certain essential molecule is lacking in a cell. The evolution of the primitive genetic and metabolic systems was thus coordinated and synchronized. According to the stable complex model, almost all functional molecules establish binding affinity and specific recognition through complementary interactions, and functional molecules therefore have the nature of being auto-reactive. This is thermodynamically favorable and leads to functional duplication and self-organization. Therefore, it can be speculated that biological systems have a certain tendency to maintain functional stability or are influenced by an inherent selective power. The evolution of dormant bacteria may support this hypothesis, and inherent selectivity can be unified with natural selection at the molecular level.
Collapse
Affiliation(s)
- Shaojie Deng
- Chongqing (Fengjie) Municipal Bureau of Planning and Natural Resources, China
| |
Collapse
|
75
|
Poretti M, Praz CR, Sotiropoulos AG, Wicker T. A survey of lineage-specific genes in Triticeae reveals de novo gene evolution from genomic raw material. PLANT DIRECT 2023; 7:e484. [PMID: 36937792 PMCID: PMC10020141 DOI: 10.1002/pld3.484] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 01/26/2023] [Accepted: 01/27/2023] [Indexed: 06/18/2023]
Abstract
Diploid plant genomes typically contain ~35,000 genes, almost all belonging to highly conserved gene families. Only a small fraction are lineage-specific, which are found in only one or few closely related species. Little is known about how genes arise de novo in plant genomes and how often this occurs; however, they are believed to be important for plants diversification and adaptation. We developed a pipeline to identify lineage-specific genes in Triticeae, using newly available genome assemblies of wheat, barley, and rye. Applying a set of stringent criteria, we identified 5942 candidate Triticeae-specific genes (TSGs), of which 2337 were validated as protein-coding genes in wheat. Differential gene expression analyses revealed that stress-induced wheat TSGs are strongly enriched in putative secreted proteins. Some were previously described to be involved in Triticeae non-host resistance and cold response. Additionally, we show that 1079 TSGs have sequence homology to transposable elements (TEs), ~68% of them deriving from regulatory non-coding regions of Gypsy retrotransposons. Most importantly, we demonstrate that these TSGs are enriched in transmembrane domains and are among the most highly expressed wheat genes overall. To summarize, we conclude that de novo gene formation is relatively rare and that Triticeae probably possess ~779 lineage-specific genes per haploid genome. TSGs, which respond to pathogen and environmental stresses, may be interesting candidates for future targeted resistance breeding in Triticeae. Finally, we propose that non-coding regions of TEs might provide important genetic raw material for the functional innovation of TM domains and the evolution of novel secreted proteins.
Collapse
Affiliation(s)
- Manuel Poretti
- Department of Plant and Microbial BiologyUniversity of ZurichZurichSwitzerland
- Department of BiologyUniversity of FribourgFribourgSwitzerland
| | - Coraline R. Praz
- Department of Plant and Microbial BiologyUniversity of ZurichZurichSwitzerland
- Centro de Biotecnología y Genómica de PlantasUniversidad Politécnica de Madrid (UPM)–Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA)MadridSpain
| | | | - Thomas Wicker
- Department of Plant and Microbial BiologyUniversity of ZurichZurichSwitzerland
| |
Collapse
|
76
|
Chen Y, Ma T, Zhang T, Ma L. Trends in the evolution of intronless genes in Poaceae. FRONTIERS IN PLANT SCIENCE 2023; 14:1065631. [PMID: 36875616 PMCID: PMC9978806 DOI: 10.3389/fpls.2023.1065631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 02/01/2023] [Indexed: 06/18/2023]
Abstract
Intronless genes (IGs), which are a feature of prokaryotes, are a fascinating group of genes that are also present in eukaryotes. In the current study, a comparison of Poaceae genomes revealed that the origin of IGs may have involved ancient intronic splicing, reverse transcription, and retrotranspositions. Additionally, IGs exhibit the typical features of rapid evolution, including recent duplications, variable copy numbers, low divergence between paralogs, and high non-synonymous to synonymous substitution ratios. By tracing IG families along the phylogenetic tree, we determined that the evolutionary dynamics of IGs differed among Poaceae subfamilies. IG families developed rapidly before the divergence of Pooideae and Oryzoideae and expanded slowly after the divergence. In contrast, they emerged gradually and consistently in the Chloridoideae and Panicoideae clades during evolution. Furthermore, IGs are expressed at low levels. Under relaxed selection pressure, retrotranspositions, intron loss, and gene duplications and conversions may promote the evolution of IGs. The comprehensive characterization of IGs is critical for in-depth studies on intron functions and evolution as well as for assessing the importance of introns in eukaryotes.
Collapse
Affiliation(s)
- Yong Chen
- *Correspondence: Tingting Zhang, ; Lei Ma,
| | | | | | - Lei Ma
- *Correspondence: Tingting Zhang, ; Lei Ma,
| |
Collapse
|
77
|
Yu J, Jiang W, Zhu SB, Liao Z, Dou X, Liu J, Guo FB, Dong C. Prediction of protein-coding small ORFs in multi-species using integrated sequence-derived features and the random forest model. Methods 2023; 210:10-19. [PMID: 36621557 DOI: 10.1016/j.ymeth.2022.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/27/2022] [Accepted: 12/30/2022] [Indexed: 01/07/2023] Open
Abstract
Proteins encoded by small open reading frames (sORFs) can serve as functional elements playing important roles in vivo. Such sORFs also constitute the potential pool for facilitating the de novo gene birth, driving evolutionary innovation and species diversity. Therefore, their theoretical and experimental identification has become a critical issue. Herein, we proposed a protein-coding sORFs prediction method merely based on integrative sequence-derived features. Our prediction performance is better or comparable compared with other nine prevalent methods, which shows that our method can provide a relatively reliable research tool for the prediction of protein-coding sORFs. Our method allows users to estimate the potential expression of a queried sORF, which has been demonstrated by the correlation analysis between our possibility estimation and codon adaption index (CAI). Based on the features that we used, we demonstrated that the sequence features of the protein-coding sORFs in the two domains have significant differences implying that it might be a relatively hard task in terms of cross-domain prediction, hence domain-specific models were developed, which allowed users to predict protein-coding sORFs both in eukaryotes and prokaryotes. Finally, a web-server was developed and provided to boost and facilitate the study of the related field, which is freely available at http://guolab.whu.edu.cn/codingCapacity/index.html.
Collapse
Affiliation(s)
- Jiafeng Yu
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Wenwen Jiang
- Department of Bioinformatics, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Sen-Bin Zhu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Zhen Liao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xianghua Dou
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Jian Liu
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Feng-Biao Guo
- School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China.
| | - Chuan Dong
- School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China.
| |
Collapse
|
78
|
An NA, Zhang J, Mo F, Luan X, Tian L, Shen QS, Li X, Li C, Zhou F, Zhang B, Ji M, Qi J, Zhou WZ, Ding W, Chen JY, Yu J, Zhang L, Shu S, Hu B, Li CY. De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nat Ecol Evol 2023; 7:264-278. [PMID: 36593289 PMCID: PMC9911349 DOI: 10.1038/s41559-022-01925-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 10/04/2022] [Indexed: 01/03/2023]
Abstract
Human de novo genes can originate from neutral long non-coding RNA (lncRNA) loci and are evolutionarily significant in general, yet how and why this all-or-nothing transition to functionality happens remains unclear. Here, in 74 human/hominoid-specific de novo genes, we identified distinctive U1 elements and RNA splice-related sequences accounting for RNA nuclear export, differentiating mRNAs from lncRNAs, and driving the origin of de novo genes from lncRNA loci. The polymorphic sites facilitating the lncRNA-mRNA conversion through regulating nuclear export are selectively constrained, maintaining a boundary that differentiates mRNAs from lncRNAs. The functional new genes actively passing through it thus showed a mode of pre-adaptive origin, in that they acquire functions along with the achievement of their coding potential. As a proof of concept, we verified the regulations of splicing and U1 recognition on the nuclear export efficiency of one of these genes, the ENSG00000205704, in human neural progenitor cells. Notably, knock-out or over-expression of this gene in human embryonic stem cells accelerates or delays the neuronal maturation of cortical organoids, respectively. The transgenic mice with ectopically expressed ENSG00000205704 showed enlarged brains with cortical expansion. We thus demonstrate the key roles of nuclear export in de novo gene origin. These newly originated genes should reflect the novel uniqueness of human brain development.
Collapse
Affiliation(s)
- Ni A An
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jie Zhang
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuke Luan
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Lu Tian
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Qing Sunny Shen
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Xiangshang Li
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chunqiong Li
- Chinese Institute for Brain Research, Beijing, China
| | - Fanqi Zhou
- State Key Laboratory of Medical Molecular Biology, Key Laboratory of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing, China
| | - Boya Zhang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingjun Ji
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jianhuan Qi
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei-Zhen Zhou
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wanqiu Ding
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Jia Yu
- State Key Laboratory of Medical Molecular Biology, Key Laboratory of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Shaokun Shu
- Peking University International Cancer Institute, Beijing, China
| | - Baoyang Hu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Chuan-Yun Li
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China.
- Chinese Institute for Brain Research, Beijing, China.
| |
Collapse
|
79
|
Affiliation(s)
- April Rich
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh Medical School, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh Medical School, Pittsburgh, PA, USA.
| |
Collapse
|
80
|
Li J, Shen J, Wang R, Chen Y, Zhang T, Wang H, Guo C, Qi J. The nearly complete assembly of the Cercis chinensis genome and Fabaceae phylogenomic studies provide insights into new gene evolution. PLANT COMMUNICATIONS 2023; 4:100422. [PMID: 35957520 PMCID: PMC9860166 DOI: 10.1016/j.xplc.2022.100422] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 08/02/2022] [Accepted: 08/05/2022] [Indexed: 05/27/2023]
Abstract
Fabaceae is a large family of angiosperms with high biodiversity that contains a variety of economically important crops and model plants for the study of biological nitrogen fixation. Polyploidization events have been extensively studied in some Fabaceae plants, but the occurrence of new genes is still concealed, owing to a lack of genomic information on certain species of the basal clade of Fabaceae. Cercis chinensis (Cercidoideae) is one such species; it diverged earliest from Fabaceae and is essential for phylogenomic studies and new gene predictions in Fabaceae. To facilitate genomic studies on Fabaceae, we performed genome sequencing of C. chinensis and obtained a 352.84 Mb genome, which was further assembled into seven pseudochromosomes with 30 612 predicted protein-coding genes. Compared with other legume genomes, that of C. chinensis exhibits no lineage-specific polyploidization event. Further phylogenomic analyses of 22 legumes and 11 other angiosperms revealed that many gene families are lineage specific before and after the diversification of Fabaceae. Among them, dozens of genes are candidates for new genes that have evolved from intergenic regions and are thus regarded as de novo-originated genes. They differ significantly from established genes in coding sequence length, exon number, guanine-cytosine content, and expression patterns among tissues. Functional analysis revealed that many new genes are related to asparagine metabolism. This study represents an important advance in understanding the evolutionary pattern of new genes in legumes and provides a valuable resource for plant phylogenomic studies.
Collapse
Affiliation(s)
- Jinglong Li
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Jingting Shen
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Rui Wang
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Yamao Chen
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Taikui Zhang
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Haifeng Wang
- College of Agriculture, Guangxi University, Nanning 530004, China
| | - Chunce Guo
- Jiangxi Provincial Key Laboratory for Bamboo Germplasm Resources and Utilization, Forestry College, Jiangxi Agricultural University, Nanchang 330045, China
| | - Ji Qi
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China.
| |
Collapse
|
81
|
Nie S, Zhao SW, Shi TL, Zhao W, Zhang RG, Tian XC, Guo JF, Yan XM, Bao YT, Li ZC, Kong L, Ma HY, Chen ZY, Liu H, El-Kassaby YA, Porth I, Yang FS, Mao JF. Gapless genome assembly of azalea and multi-omics investigation into divergence between two species with distinct flower color. HORTICULTURE RESEARCH 2023; 10:uhac241. [PMID: 36643737 PMCID: PMC9832866 DOI: 10.1093/hr/uhac241] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 10/17/2022] [Indexed: 05/09/2023]
Abstract
The genus Rhododendron (Ericaceae), with more than 1000 species highly diverse in flower color, is providing distinct ornamental values and a model system for flower color studies. Here, we investigated the divergence between two parental species with different flower color widely used for azalea breeding. Gapless genome assembly was generated for the yellow-flowered azalea, Rhododendron molle. Comparative genomics found recent proliferation of long terminal repeat retrotransposons (LTR-RTs), especially Gypsy, has resulted in a 125 Mb (19%) genome size increase in species-specific regions, and a significant amount of dispersed gene duplicates (13 402) and pseudogenes (17 437). Metabolomic assessment revealed that yellow flower coloration is attributed to the dynamic changes of carotenoids/flavonols biosynthesis and chlorophyll degradation. Time-ordered gene co-expression networks (TO-GCNs) and the comparison confirmed the metabolome and uncovered the specific gene regulatory changes underpinning the distinct flower pigmentation. B3 and ERF TFs were found dominating the gene regulation of carotenoids/flavonols characterized pigmentation in R. molle, while WRKY, ERF, WD40, C2H2, and NAC TFs collectively regulated the anthocyanins characterized pigmentation in the red-flowered R simsii. This study employed a multi-omics strategy in disentangling the complex divergence between two important azaleas and provided references for further functional genetics and molecular breeding.
Collapse
Affiliation(s)
- Shuai Nie
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Shi-Wei Zhao
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Tian-Le Shi
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Wei Zhao
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, SE-901 87 Umeå, Sweden
| | - Ren-Gang Zhang
- Department of Bioinformatics, Ori (Shandong) Gene Science and Technology Co., Ltd., Weifang 261322, China
| | - Xue-Chan Tian
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Jing-Fang Guo
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xue-Mei Yan
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Yu-Tao Bao
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhi-Chao Li
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Lei Kong
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hai-Yao Ma
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhao-Yang Chen
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hui Liu
- National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Yousry A El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Ilga Porth
- Départment des Sciences du Bois et de la Forêt, Faculté de Foresterie, de Géographie et Géomatique, Université Laval, Québec, QC, G1V 0A6, Canada
| | | | | |
Collapse
|
82
|
Moreyra NN, Almeida FC, Allan C, Frankel N, Matzkin LM, Hasson E. Phylogenomics provides insights into the evolution of cactophily and host plant shifts in Drosophila. Mol Phylogenet Evol 2023; 178:107653. [PMID: 36404461 DOI: 10.1016/j.ympev.2022.107653] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/30/2022] [Accepted: 10/25/2022] [Indexed: 11/06/2022]
Abstract
Cactophilic species of the Drosophila buzzatii cluster (repleta group) comprise an excellent model group to investigate genomic changes underlying adaptation to extreme climate conditions and host plants. In particular, these species form a tractable system to study the transition from chemically simpler breeding sites (like prickly pears of the genus Opuntia) to chemically more complex hosts (columnar cacti). Here, we report four highly contiguous genome assemblies of three species of the buzzatii cluster. Based on this genomic data and inferred phylogenetic relationships, we identified candidate taxonomically restricted genes (TRGs) likely involved in the evolution of cactophily and cactus host specialization. Functional enrichment analyses of TRGs within the buzzatii cluster identified genes involved in detoxification, water preservation, immune system response, anatomical structure development, and morphogenesis. In contrast, processes that regulate responses to stress, as well as the metabolism of nitrogen compounds, transport, and secretion were found in the set of species that are columnar cacti dwellers. These findings are in line with the hypothesis that those genomic changes brought about key mechanisms underlying the adaptation of the buzzatii cluster species to arid regions in South America.
Collapse
Affiliation(s)
- Nicolás Nahuel Moreyra
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | - Francisca Cunha Almeida
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | - Carson Allan
- Department of Entomology, University of Arizona, Tucson, AZ 85719, USA.
| | - Nicolás Frankel
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | | | - Esteban Hasson
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| |
Collapse
|
83
|
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep 2022; 41:111808. [PMID: 36543139 PMCID: PMC10073203 DOI: 10.1016/j.celrep.2022.111808] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 06/21/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022] Open
Abstract
Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari, Greece.
| | - Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Kate M Duggan
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
| |
Collapse
|
84
|
The Theory of Carcino-Evo-Devo and Its Non-Trivial Predictions. Genes (Basel) 2022; 13:genes13122347. [PMID: 36553613 PMCID: PMC9777766 DOI: 10.3390/genes13122347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 12/04/2022] [Accepted: 12/08/2022] [Indexed: 12/15/2022] Open
Abstract
To explain the sources of additional cell masses in the evolution of multicellular organisms, the theory of carcino-evo-devo, or evolution by tumor neofunctionalization, has been developed. The important demand for a new theory in experimental science is the capability to formulate non-trivial predictions which can be experimentally confirmed. Several non-trivial predictions were formulated using carcino-evo-devo theory, four of which are discussed in the present paper: (1) The number of cellular oncogenes should correspond to the number of cell types in the organism. The evolution of oncogenes, tumor suppressor and differentiation gene classes should proceed concurrently. (2) Evolutionarily new and evolving genes should be specifically expressed in tumors (TSEEN genes). (3) Human orthologs of fish TSEEN genes should acquire progressive functions connected with new cell types, tissues and organs. (4) Selection of tumors for new functions in the organism is possible. Evolutionarily novel organs should recapitulate tumor features in their development. As shown in this paper, these predictions have been confirmed by the laboratory of the author. Thus, we have shown that carcino-evo-devo theory has predictive power, fulfilling a fundamental requirement for a new theory.
Collapse
|
85
|
Posadas-García YS, Espinosa-Soto C. Early effects of gene duplication on the robustness and phenotypic variability of gene regulatory networks. BMC Bioinformatics 2022; 23:509. [DOI: 10.1186/s12859-022-05067-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/18/2022] [Indexed: 11/29/2022] Open
Abstract
Abstract
Background
Research on gene duplication is abundant and comes from a wide range of approaches, from high-throughput analyses and experimental evolution to bioinformatics and theoretical models. Notwithstanding, a consensus is still lacking regarding evolutionary mechanisms involved in evolution through gene duplication as well as the conditions that affect them. We argue that a better understanding of evolution through gene duplication requires considering explicitly that genes do not act in isolation. It demands studying how the perturbation that gene duplication implies percolates through the web of gene interactions. Due to evolution’s contingent nature, the paths that lead to the final fate of duplicates must depend strongly on the early stages of gene duplication, before gene copies have accumulated distinctive changes.
Methods
Here we use a widely-known model of gene regulatory networks to study how gene duplication affects network behavior in early stages. Such networks comprise sets of genes that cross-regulate. They organize gene activity creating the gene expression patterns that give cells their phenotypic properties. We focus on how duplication affects two evolutionarily relevant properties of gene regulatory networks: mitigation of the effect of new mutations and access to new phenotypic variants through mutation.
Results
Among other observations, we find that those networks that are better at maintaining the original phenotype after duplication are usually also better at buffering the effect of single interaction mutations and that duplication tends to enhance further this ability. Moreover, the effect of mutations after duplication depends on both the kind of mutation and genes involved in it. We also found that those phenotypes that had easier access through mutation before duplication had higher chances of remaining accessible through new mutations after duplication.
Conclusion
Our results support that gene duplication often mitigates the impact of new mutations and that this effect is not merely due to changes in the number of genes. The work that we put forward helps to identify conditions under which gene duplication may enhance evolvability and robustness to mutations.
Collapse
|
86
|
Translation and natural selection of micropeptides from long non-canonical RNAs. Nat Commun 2022; 13:6515. [PMID: 36316320 PMCID: PMC9622821 DOI: 10.1038/s41467-022-34094-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 10/13/2022] [Indexed: 12/25/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides but lacking canonical coding sequences. Apparently unable to produce peptides, lncRNA function seems to rely only on RNA expression, sequence and structure. Here, we exhaustively detect in-vivo translation of small open reading frames (small ORFs) within lncRNAs using Ribosomal profiling during Drosophila melanogaster embryogenesis. We show that around 30% of lncRNAs contain small ORFs engaged by ribosomes, leading to regulated translation of 100 to 300 micropeptides. We identify lncRNA features that favour translation, such as cistronicity, Kozak sequences, and conservation. For the latter, we develop a bioinformatics pipeline to detect small ORF homologues, and reveal evidence of natural selection favouring the conservation of micropeptide sequence and function across evolution. Our results expand the repertoire of lncRNA biochemical functions, and suggest that lncRNAs give rise to novel coding genes throughout evolution. Since most lncRNAs contain small ORFs with as yet unknown translation potential, we propose to rename them "long non-canonical RNAs".
Collapse
|
87
|
Cerqueira de Araujo A, Josse T, Sibut V, Urabe M, Asadullah A, Barbe V, Nakai M, Huguet E, Periquet G, Drezen JM. Chelonus inanitus bracovirus encodes lineage-specific proteins and truncated immune IκB-like factors. J Gen Virol 2022; 103. [DOI: 10.1099/jgv.0.001791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Bracoviruses and ichnoviruses are endogenous viruses of parasitic wasps that produce particles containing virulence genes expressed in host tissues and necessary for parasitism success. In the case of bracoviruses the particles are produced by conserved genes of nudiviral origin integrated permanently in the wasp genome, whereas the virulence genes can strikingly differ depending on the wasp lineage. To date most data obtained on bracoviruses concerned species from the braconid subfamily of Microgastrinae. To gain a broader view on the diversity of virulence genes we sequenced the genome packaged in the particles of Chelonus inanitus bracovirus (CiBV) produced by a wasp belonging to a different subfamily: the Cheloninae. These are egg-larval parasitoids, which means that they oviposit into the host egg and the wasp larvae then develop within the larval stages of the host. We found that most of CiBV virulence genes belong to families that are specific to Cheloninae. As other bracoviruses and ichnoviruses however, CiBV encode v-ank genes encoding truncated versions of the immune cactus/IκB factor, which suggests these proteins might play a key role in host–parasite interactions involving domesticated endogenous viruses. We found that the structures of CiBV V-ANKs are different from those previously reported. Phylogenetic analysis supports the hypothesis that they may originate from a cactus/IκB immune gene from the wasp genome acquired by the bracovirus. However, their evolutionary history is different from that shared by other V-ANKs, whose common origin probably reflects horizontal gene transfer events of virus sequences between braconid and ichneumonid wasps.
Collapse
Affiliation(s)
| | - Thibaut Josse
- Institut de Recherche sur la Biologie de l'Insecte (IRBI), UMR 7261, CNRS - Université de Tours, Tours, France
| | - Vonick Sibut
- Institut de Recherche sur la Biologie de l'Insecte (IRBI), UMR 7261, CNRS - Université de Tours, Tours, France
| | - Mariko Urabe
- Graduate School of Agriculture, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan
| | - Azam Asadullah
- Graduate School of Agriculture, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan
| | - Valérie Barbe
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057 Evry, France
| | - Madoka Nakai
- Graduate School of Agriculture, Tokyo University of Agriculture and Technology, Tokyo 183-8509, Japan
| | - Elisabeth Huguet
- Institut de Recherche sur la Biologie de l'Insecte (IRBI), UMR 7261, CNRS - Université de Tours, Tours, France
| | - Georges Periquet
- Institut de Recherche sur la Biologie de l'Insecte (IRBI), UMR 7261, CNRS - Université de Tours, Tours, France
| | - Jean-Michel Drezen
- Institut de Recherche sur la Biologie de l'Insecte (IRBI), UMR 7261, CNRS - Université de Tours, Tours, France
| |
Collapse
|
88
|
Bruley A, Mornon JP, Duprat E, Callebaut I. Digging into the 3D Structure Predictions of AlphaFold2 with Low Confidence: Disorder and Beyond. Biomolecules 2022; 12:1467. [PMID: 36291675 PMCID: PMC9599455 DOI: 10.3390/biom12101467] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 01/12/2023] Open
Abstract
AlphaFold2 (AF2) has created a breakthrough in biology by providing three-dimensional structure models for whole-proteome sequences, with unprecedented levels of accuracy. In addition, the AF2 pLDDT score, related to the model confidence, has been shown to provide a good measure of residue-wise disorder. Here, we combined AF2 predictions with pyHCA, a tool we previously developed to identify foldable segments and estimate their order/disorder ratio, from a single protein sequence. We focused our analysis on the AF2 predictions available for 21 reference proteomes (AFDB v1), in particular on their long foldable segments (>30 amino acids) that exhibit characteristics of soluble domains, as estimated by pyHCA. Among these segments, we provided a global analysis of those with very low pLDDT values along their entire length and compared their characteristics to those of segments with very high pLDDT values. We highlighted cases containing conditional order, as well as cases that could form well-folded structures but escape the AF2 prediction due to a shallow multiple sequence alignment and/or undocumented structure or fold. AF2 and pyHCA can therefore be advantageously combined to unravel cryptic structural features in whole proteomes and to refine predictions for different flavors of disorder.
Collapse
|
89
|
Moutinho AF, Eyre-Walker A, Dutheil JY. Strong evidence for the adaptive walk model of gene evolution in Drosophila and Arabidopsis. PLoS Biol 2022; 20:e3001775. [PMID: 36099311 PMCID: PMC9470001 DOI: 10.1371/journal.pbio.3001775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 08/01/2022] [Indexed: 11/19/2022] Open
Abstract
Understanding the dynamics of species adaptation to their environments has long been a central focus of the study of evolution. Theories of adaptation propose that populations evolve by “walking” in a fitness landscape. This “adaptive walk” is characterised by a pattern of diminishing returns, where populations further away from their fitness optimum take larger steps than those closer to their optimal conditions. Hence, we expect young genes to evolve faster and experience mutations with stronger fitness effects than older genes because they are further away from their fitness optimum. Testing this hypothesis, however, constitutes an arduous task. Young genes are small, encode proteins with a higher degree of intrinsic disorder, are expressed at lower levels, and are involved in species-specific adaptations. Since all these factors lead to increased protein evolutionary rates, they could be masking the effect of gene age. While controlling for these factors, we used population genomic data sets of Arabidopsis and Drosophila and estimated the rate of adaptive substitutions across genes from different phylostrata. We found that a gene’s evolutionary age significantly impacts the molecular rate of adaptation. Moreover, we observed that substitutions in young genes tend to have larger physicochemical effects. Our study, therefore, provides strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale. This study uses population genomic datasets from Arabidopsis and Drosophila to show that young genes adapt faster and are subject to mutations of larger fitness effects, providing strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale.
Collapse
Affiliation(s)
- Ana Filipa Moutinho
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail:
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Julien Y. Dutheil
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Unité Mixte de Recherche 5554 Institut des Sciences de l’Evolution, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| |
Collapse
|
90
|
Evolutionary New Genes in a Growing Paradigm. Genes (Basel) 2022; 13:genes13091605. [PMID: 36140774 PMCID: PMC9498540 DOI: 10.3390/genes13091605] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 08/28/2022] [Indexed: 11/26/2022] Open
|
91
|
Malekos E, Carpenter S. Short open reading frame genes in innate immunity: from discovery to characterization. Trends Immunol 2022; 43:741-756. [PMID: 35965152 PMCID: PMC10118063 DOI: 10.1016/j.it.2022.07.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/11/2022] [Accepted: 07/13/2022] [Indexed: 12/27/2022]
Abstract
Next-generation sequencing (NGS) technologies have greatly expanded the size of the known transcriptome. Many newly discovered transcripts are classified as long noncoding RNAs (lncRNAs) which are assumed to affect phenotype through sequence and structure and not via translated protein products despite the vast majority of them harboring short open reading frames (sORFs). Recent advances have demonstrated that the noncoding designation is incorrect in many cases and that sORF-encoded peptides (SEPs) translated from these transcripts are important contributors to diverse biological processes. Interest in SEPs is at an early stage and there is evidence for the existence of thousands of SEPs that are yet unstudied. We hope to pique interest in investigating this unexplored proteome by providing a discussion of SEP characterization generally and describing specific discoveries in innate immunity.
Collapse
Affiliation(s)
- Eric Malekos
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA; Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Susan Carpenter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA; Department of Molecular Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
92
|
Parikh SB, Houghton C, Van Oss SB, Wacholder A, Carvunis A. Origins, evolution, and physiological implications of de novo genes in yeast. Yeast 2022; 39:471-481. [PMID: 35959631 PMCID: PMC9544372 DOI: 10.1002/yea.3810] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/03/2022] Open
Abstract
De novo gene birth is the process by which new genes emerge in sequences that were previously noncoding. Over the past decade, researchers have taken advantage of the power of yeast as a model and a tool to study the evolutionary mechanisms and physiological implications of de novo gene birth. We summarize the mechanisms that have been proposed to explicate how noncoding sequences can become protein-coding genes, highlighting the discovery of pervasive translation of the yeast transcriptome and its presumed impact on evolutionary innovation. We summarize current best practices for the identification and characterization of de novo genes. Crucially, we explain that the field is still in its nascency, with the physiological roles of most young yeast de novo genes identified thus far still utterly unknown. We hope this review inspires researchers to investigate the true contribution of de novo gene birth to cellular physiology and phenotypic diversity across yeast strains and species.
Collapse
Affiliation(s)
- Saurin B. Parikh
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - S. Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Anne‐Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| |
Collapse
|
93
|
Liang Z, Luo Z, Zhang W, Yu K, Wang H, Geng B, Yang Q, Ni Z, Zeng C, Zheng Y, Li C, Yang S, Ma Y, Dai J. Synthetic refactor of essential genes decodes functionally constrained sequences in yeast genome. iScience 2022; 25:104982. [PMID: 36093046 PMCID: PMC9460170 DOI: 10.1016/j.isci.2022.104982] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 07/14/2022] [Accepted: 08/16/2022] [Indexed: 11/28/2022] Open
Affiliation(s)
- Zhenzhen Liang
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Zhouqing Luo
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
- Corresponding author
| | - Weimin Zhang
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, New York University Langone Medical Center, New York, NY 10011, USA
| | - Kang Yu
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Hui Wang
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
| | - Binan Geng
- Hubei Collaborative Innovation Center for Green Transformation of Bio-resources, Environmental Microbial Technology Center of Hubei Province, Hubei Key Laboratory of Industrial Biotechnology, College of Life Sciences, Hubei University, Wuhan 430062, China
| | - Qing Yang
- Hubei Collaborative Innovation Center for Green Transformation of Bio-resources, Environmental Microbial Technology Center of Hubei Province, Hubei Key Laboratory of Industrial Biotechnology, College of Life Sciences, Hubei University, Wuhan 430062, China
| | - Zuoyu Ni
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Cheng Zeng
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yihui Zheng
- Key Laboratory for Industrial Biocatalysis (Ministry of Education) and Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Chunyuan Li
- Key Laboratory for Industrial Biocatalysis (Ministry of Education) and Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Shihui Yang
- Hubei Collaborative Innovation Center for Green Transformation of Bio-resources, Environmental Microbial Technology Center of Hubei Province, Hubei Key Laboratory of Industrial Biotechnology, College of Life Sciences, Hubei University, Wuhan 430062, China
| | - Yingxin Ma
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Junbiao Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- Key Laboratory for Industrial Biocatalysis (Ministry of Education) and Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Corresponding author
| |
Collapse
|
94
|
Luan YX, Cui Y, Chen WJ, Jin JF, Liu AM, Huang CW, Potapov M, Bu Y, Zhan S, Zhang F, Li S. High-quality genomes reveal significant genetic divergence and cryptic speciation in the model organism Folsomia candida (Collembola). Mol Ecol Resour 2022; 23:273-293. [PMID: 35962787 PMCID: PMC10087712 DOI: 10.1111/1755-0998.13699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/01/2022]
Abstract
The collembolan Folsomia candida Willem, 1902, is widely distributed throughout the world and has been frequently used as a test organism in soil ecology and ecotoxicology studies. However, it is questioned as an ideal "standard" because of differences in reproductive modes and cryptic genetic diversity between strains from various geographical origins. In this study, we obtained two high-quality chromosome-level genomes of F. candida, for a parthenogenetic strain (named as FCDK, 219.08 Mb, 25,139 protein-coding genes) and a sexual strain (named as FCSH, 153.09 Mb, 21,609 protein-coding genes), reannotated the genome of the parthenogenetic strain reported by Faddeeva-Vakhrusheva et al. in 2017 (named as FCBL, 221.7 Mb, 25,980 protein-coding genes), and conducted comparative genomic analyses of three strains. High genome similarities between FCDK and FCBL on synteny, genome architecture, mitochondrial and nuclear gene sequences support they are conspecific. The seven chromosomes of FCDK are each 25-54% larger than the corresponding chromosomes of FCSH, showing obvious repetitive element expansions and large-scale inversions and translocations but no whole-genome duplication. The strain-specific genes, expanded gene families and genes in nonsyntenic chromosomal regions identified in FCDK are highly related to the broader environmental adaptation of parthenogenetic strains. In addition, FCDK has fewer strain-specific microRNAs than FCSH, and their mitochondrial and nuclear genes have diverged greatly. In conclusion, FCDK/FCBL and FCSH have accumulated independent genetic changes and evolved into distinct species since 10 Mya. Our work provides important genomic resources for studying the mechanisms of rapidly cryptic speciation and soil arthropod adaptation to soil ecosystems.
Collapse
Affiliation(s)
- Yun-Xia Luan
- Guangdong Provincial Key Laboratory of Insect Development Biology and Applied Technology, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou, China.,Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Yingying Cui
- Guangdong Provincial Key Laboratory of Insect Development Biology and Applied Technology, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou, China
| | | | - Jian-Feng Jin
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Ai-Min Liu
- Department of Pomology, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Cheng-Wang Huang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | - Yun Bu
- Natural History Research Center, Shanghai Natural History Museum, Shanghai Science & Technology Museum, Shanghai, China
| | - Shuai Zhan
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Feng Zhang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Sheng Li
- Guangdong Provincial Key Laboratory of Insect Development Biology and Applied Technology, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou, China.,Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China.,Guangmeiyuan R&D Center, Guangdong Provincial Key Laboratory of Insect Developmental Biology and Applied Technology, South China Normal University, Meizhou, China
| |
Collapse
|
95
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
96
|
Song H, Guo Z, Zhang X, Sui J. De novo genes in Arachis hypogaea cv. Tifrunner: systematic identification, molecular evolution, and potential contributions to cultivated peanut. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1081-1095. [PMID: 35748398 DOI: 10.1111/tpj.15875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 06/15/2022] [Accepted: 06/21/2022] [Indexed: 06/15/2023]
Abstract
De novo genes are derived from non-coding sequences, and they can play essential roles in organisms. Cultivated peanut (Arachis hypogaea) is a major oil and protein crop derived from a cross between Arachis duranensis and Arachis ipaensis. However, few de novo genes have been documented in Arachis. Here, we identified 381 de novo genes in A. hypogaea cv. Tifrunner based on comparison with five closely related Arachis species. There are distinct differences in gene expression patterns and gene structures between conserved and de novo genes. The identified de novo genes originated from ancestral sequence regions associated with metabolic and biosynthetic processes, and they were subsequently integrated into existing regulatory networks. De novo paralogs and homoeologs were identified in A. hypogaea cv. Tifrunner. De novo paralogs and homoeologs with conserved expression have mismatching cis-acting elements under normal growth conditions. De novo genes potentially have pluripotent functions in responses to biotic stresses as well as in growth and development based on quantitative trait locus data. This work provides a foundation for future research examining gene birth processes and gene function in Arachis and related taxa.
Collapse
Affiliation(s)
- Hui Song
- Grassland Agri-husbandry Research Center, College of Grassland Science, Qingdao Agricultural University, Qingdao, China
| | - Zhonglong Guo
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Life Sciences and School of Advanced Agricultural Sciences, Peking University, Beijing, China
| | - Xiaojun Zhang
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| | - Jiongming Sui
- College of Agronomy, Qingdao Agricultural University, Qingdao, China
| |
Collapse
|
97
|
Na Z, Dai X, Zheng SJ, Bryant CJ, Loh KH, Su H, Luo Y, Buhagiar AF, Cao X, Baserga SJ, Chen S, Slavoff SA. Mapping subcellular localizations of unannotated microproteins and alternative proteins with MicroID. Mol Cell 2022; 82:2900-2911.e7. [PMID: 35905735 PMCID: PMC9662605 DOI: 10.1016/j.molcel.2022.06.035] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 04/08/2022] [Accepted: 06/29/2022] [Indexed: 11/15/2022]
Abstract
Proteogenomic identification of translated small open reading frames has revealed thousands of previously unannotated, largely uncharacterized microproteins, or polypeptides of less than 100 amino acids, and alternative proteins (alt-proteins) that are co-encoded with canonical proteins and are often larger. The subcellular localizations of microproteins and alt-proteins are generally unknown but can have significant implications for their functions. Proximity biotinylation is an attractive approach to define the protein composition of subcellular compartments in cells and in animals. Here, we developed a high-throughput technology to map unannotated microproteins and alt-proteins to subcellular localizations by proximity biotinylation with TurboID (MicroID). More than 150 microproteins and alt-proteins are associated with subnuclear organelles. One alt-protein, alt-LAMA3, localizes to the nucleolus and functions in pre-rRNA transcription. We applied MicroID in a mouse model, validating expression of a conserved nuclear microprotein, and establishing MicroID for discovery of microproteins and alt-proteins in vivo.
Collapse
Affiliation(s)
- Zhenkun Na
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Xiaoyun Dai
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Shu-Jian Zheng
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Carson J Bryant
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA
| | - Ken H Loh
- Laboratory of Molecular Genetics, Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Haomiao Su
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Yang Luo
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Amber F Buhagiar
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA
| | - Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Susan J Baserga
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Sidi Chen
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA.
| |
Collapse
|
98
|
Manrubia S. The simple emergence of complex molecular function. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2022; 380:20200422. [PMID: 35599566 DOI: 10.1098/rsta.2020.0422] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
At odds with a traditional view of molecular evolution that seeks a descent-with-modification relationship between functional sequences, new functions can emerge de novo with relative ease. At early times of molecular evolution, random polymers could have sufficed for the appearance of incipient chemical activity, while the cellular environment harbours a myriad of proto-functional molecules. The emergence of function is facilitated by several mechanisms intrinsic to molecular organization, such as redundant mapping of sequences into structures, phenotypic plasticity, modularity or cooperative associations between genomic sequences. It is the availability of niches in the molecular ecology that filters new potentially functional proposals. New phenotypes and subsequent levels of molecular complexity could be attained through combinatorial explorations of currently available molecular variants. Natural selection does the rest. This article is part of the theme issue 'Emergent phenomena in complex physical and socio-technical systems: from cells to societies'.
Collapse
Affiliation(s)
- Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Systems Biology Department, National Biotechnology Centre (CSIC), c/Darwin 3, 28049 Madrid, Spain
| |
Collapse
|
99
|
Jiang M, Li X, Dong X, Zu Y, Zhan Z, Piao Z, Lang H. Research Advances and Prospects of Orphan Genes in Plants. FRONTIERS IN PLANT SCIENCE 2022; 13:947129. [PMID: 35874010 PMCID: PMC9305701 DOI: 10.3389/fpls.2022.947129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/23/2022] [Indexed: 06/15/2023]
Abstract
Orphan genes (OGs) are defined as genes having no sequence similarity with genes present in other lineages. OGs have been regarded to play a key role in the development of lineage-specific adaptations and can also serve as a constant source of evolutionary novelty. These genes have often been found related to various stress responses, species-specific traits, special expression regulation, and also participate in primary substance metabolism. The advancement in sequencing tools and genome analysis methods has made the identification and characterization of OGs comparatively easier. In the study of OG functions in plants, significant progress has been made. We review recent advances in the fast evolving characteristics, expression modulation, and functional analysis of OGs with a focus on their role in plant biology. We also emphasize current challenges, adoptable strategies and discuss possible future directions of functional study of OGs.
Collapse
Affiliation(s)
- Mingliang Jiang
- School of Agriculture, Jilin Agricultural Science and Technology College, Jilin, China
| | - Xiaonan Li
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Xiangshu Dong
- School of Agriculture, Yunnan University, Kunming, China
| | - Ye Zu
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Zongxiang Zhan
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Zhongyun Piao
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Hong Lang
- School of Agriculture, Jilin Agricultural Science and Technology College, Jilin, China
| |
Collapse
|
100
|
Prabh N, Rödelsperger C. Multiple Pristionchus pacificus genomes reveal distinct evolutionary dynamics between de novo candidates and duplicated genes. Genome Res 2022; 32:1315-1327. [PMID: 35618417 PMCID: PMC9341508 DOI: 10.1101/gr.276431.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 05/20/2022] [Indexed: 01/03/2023]
Abstract
The birth of new genes is a major molecular innovation driving phenotypic diversity across all domains of life. Although repurposing of existing protein-coding material by duplication is considered the main process of new gene formation, recent studies have discovered thousands of transcriptionally active sequences as a rich source of new genes. However, differential loss rates have to be assumed to reconcile the high birth rates of these incipient de novo genes with the dominance of ancient gene families in individual genomes. Here, we test this rapid turnover hypothesis in the context of the nematode model organism Pristionchus pacificus We extended the existing species-level phylogenomic framework by sequencing the genomes of six divergent P. pacificus strains. We used these data to study the evolutionary dynamics of different age classes and categories of origin at a population level. Contrasting de novo candidates with new families that arose by duplication and divergence from known genes, we find that de novo candidates are typically shorter, show less expression, and are overrepresented on the sex chromosome. Although the contribution of de novo candidates increases toward young age classes, multiple comparisons within the same age class showed significantly higher attrition in de novo candidates than in known genes. Similarly, young genes remain under weak evolutionary constraints with de novo candidates representing the fastest evolving subcategory. Altogether, this study provides empirical evidence for the rapid turnover hypothesis and highlights the importance of the evolutionary timescale when quantifying the contribution of different mechanisms toward new gene formation.
Collapse
Affiliation(s)
- Neel Prabh
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, 72076 Tübingen, Germany
| | - Christian Rödelsperger
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, 72076 Tübingen, Germany
| |
Collapse
|