1
|
Guay SY, Patel PH, Thomalla JM, McDermott KL, O'Toole JM, Arnold SE, Obrycki SJ, Wolfner MF, Findlay GD. A newly evolved gene is essential for efficient sperm entry into eggs in Drosophila melanogaster. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.08.607187. [PMID: 39149251 PMCID: PMC11326263 DOI: 10.1101/2024.08.08.607187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
New genes arise through a variety of evolutionary processes and provide raw material for adaptation in the face of both natural and sexual selection. De novo evolved genes emerge from previously non-protein-coding DNA sequences, and many such genes are expressed in male reproductive structures. In Drosophila melanogaster, several putative de novo genes have evolved essential roles in spermatogenesis, but whether such genes can also impact sperm function beyond the male has not been investigated. We identified a putative de novo gene, katherine johnson (kj), that is required for high levels of male fertility. Males that do not express kj produce and transfer sperm that are stored normally in females, but sperm from these males enter eggs with severely reduced efficiency. Using a tagged transgenic rescue construct, we observed that KJ protein localizes to the nuclear periphery in various stages of spermatogenesis, but is not detectable in mature sperm. These data suggest that kj exerts an effect on sperm development, the loss of which results in reduced fertilization ability. While previous bioinformatic analyses suggested the kj gene was restricted to the melanogaster group of Drosophila, we identified putative orthologs with conserved synteny, male-biased expression, and predicted protein features across the genus, as well as instances of gene loss in some lineages. Thus, kj potentially arose in the Drosophila common ancestor and subsequently evolved an essential role in D. melanogaster. Our results demonstrate a new aspect of male reproduction that has been shaped by new gene evolution and provide a molecular foothold for further investigating the mechanism of sperm entry into eggs in Drosophila.
Collapse
Affiliation(s)
- Sara Y Guay
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Prajal H Patel
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Jonathon M Thomalla
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | - Kerry L McDermott
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Jillian M O'Toole
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Sarah E Arnold
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Sarah J Obrycki
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Mariana F Wolfner
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | | |
Collapse
|
2
|
Vakirlis N, Acar O, Cherupally V, Carvunis AR. Ancestral Sequence Reconstruction as a Tool to Detect and Study De Novo Gene Emergence. Genome Biol Evol 2024; 16:evae151. [PMID: 39004885 PMCID: PMC11299112 DOI: 10.1093/gbe/evae151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/17/2024] [Accepted: 07/09/2024] [Indexed: 07/16/2024] Open
Abstract
New protein-coding genes can evolve from previously noncoding genomic regions through a process known as de novo gene emergence. Evidence suggests that this process has likely occurred throughout evolution and across the tree of life. Yet, confidently identifying de novo emerged genes remains challenging. Ancestral sequence reconstruction is a promising approach for inferring whether a gene has emerged de novo or not, as it allows us to inspect whether a given genomic locus ancestrally harbored protein-coding capacity. However, the use of ancestral sequence reconstruction in the context of de novo emergence is still in its infancy and its capabilities, limitations, and overall potential are largely unknown. Notably, it is difficult to formally evaluate the protein-coding capacity of ancestral sequences, particularly when new gene candidates are short. How well-suited is ancestral sequence reconstruction as a tool for the detection and study of de novo genes? Here, we address this question by designing an ancestral sequence reconstruction workflow incorporating different tools and sets of parameters and by introducing a formal criterion that allows to estimate, within a desired level of confidence, when protein-coding capacity originated at a particular locus. Applying this workflow on ∼2,600 short, annotated budding yeast genes (<1,000 nucleotides), we found that ancestral sequence reconstruction robustly predicts an ancient origin for the most widely conserved genes, which constitute "easy" cases. For less robust cases, we calculated a randomization-based empirical P-value estimating whether the observed conservation between the extant and ancestral reading frame could be attributed to chance. This formal criterion allowed us to pinpoint a branch of origin for most of the less robust cases, identifying 49 genes that can unequivocally be considered de novo originated since the split of the Saccharomyces genus, including 37 Saccharomyces cerevisiae-specific genes. We find that for the remaining equivocal cases we cannot rule out different evolutionary scenarios including rapid evolution, multiple gene losses, or a recent de novo origin. Overall, our findings suggest that ancestral sequence reconstruction is a valuable tool to study de novo gene emergence but should be applied with caution and awareness of its limitations.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Omer Acar
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Vijay Cherupally
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
3
|
Roginski P, Grandchamp A, Quignot C, Lopes A. De Novo Emerged Gene Search in Eukaryotes with DENSE. Genome Biol Evol 2024; 16:evae159. [PMID: 39212967 PMCID: PMC11363675 DOI: 10.1093/gbe/evae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/07/2024] [Indexed: 09/04/2024] Open
Abstract
The discovery of de novo emerged genes, originating from previously noncoding DNA regions, challenges traditional views of species evolution. Indeed, the hypothesis of neutrally evolving sequences giving rise to functional proteins is highly unlikely. This conundrum has sparked numerous studies to quantify and characterize these genes, aiming to understand their functional roles and contributions to genome evolution. Yet, no fully automated pipeline for their identification is available. Therefore, we introduce DENSE (DE Novo emerged gene SEarch), an automated Nextflow pipeline based on two distinct steps: detection of taxonomically restricted genes (TRGs) through phylostratigraphy, and filtering of TRGs for de novo emerged genes via genome comparisons and synteny search. DENSE is available as a user-friendly command-line tool, while the second step is accessible through a web server upon providing a list of TRGs. Highly flexible, DENSE provides various strategy and parameter combinations, enabling users to adapt to specific configurations or define their own strategy through a rational framework, facilitating protocol communication, and study interoperability. We apply DENSE to seven model organisms, exploring the impact of its strategies and parameters on de novo gene predictions. This thorough analysis across species with different evolutionary rates reveals useful metrics for users to define input datasets, identify favorable/unfavorable conditions for de novo gene detection, and control potential biases in genome annotations. Additionally, predictions made for the seven model organisms are compiled into a requestable database, which we hope will serve as a reference for de novo emerged gene lists generated with specific criteria combinations.
Collapse
Affiliation(s)
- Paul Roginski
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Chloé Quignot
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| |
Collapse
|
4
|
Vakirlis N, Kupczok A. Large-scale investigation of species-specific orphan genes in the human gut microbiome elucidates their evolutionary origins. Genome Res 2024; 34:888-903. [PMID: 38977308 PMCID: PMC11293555 DOI: 10.1101/gr.278977.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 06/12/2024] [Indexed: 07/10/2024]
Abstract
Species-specific genes, also known as orphans, are ubiquitous across life's domains. In prokaryotes, species-specific orphan genes (SSOGs) are mostly thought to originate in external elements such as viruses followed by horizontal gene transfer, whereas the scenario of native origination, through rapid divergence or de novo, is mostly dismissed. However, quantitative evidence supporting either scenario is lacking. Here, we systematically analyzed genomes from 4644 human gut microbiome species and identified more than 600,000 unique SSOGs, representing an average of 2.6% of a given species' pangenome. These sequences are mostly rare within each species yet show signs of purifying selection. Overall, SSOGs use optimal codons less frequently, and their proteins are more disordered than those of conserved genes (i.e., non-SSOGs). Importantly, across species, the GC content of SSOGs closely matches that of conserved ones. In contrast, the ∼5% of SSOGs that share similarity to known viral sequences have distinct characteristics, including lower GC content. Thus, SSOGs with similarity to viruses differ from the remaining SSOGs, contrasting an external origination scenario for most of them. By examining the orthologous genomic region in closely related species, we show that a small subset of SSOGs likely evolved natively de novo and find that these genes also differ in their properties from the remaining SSOGs. Our results challenge the notion that external elements are the dominant source of prokaryotic genetic novelty and will enable future studies into the biological role and relevance of species-specific genes in the human gut.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute For Fundamental Biomedical Research, B.S.R.C. "Alexander Fleming," Vari 166 72, Greece;
- Institute for General Microbiology, Kiel University, 24118 Kiel, Germany
| | - Anne Kupczok
- Bioinformatics Group, Wageningen University, 6700 PB Wageningen, The Netherlands
| |
Collapse
|
5
|
Zhao Q, Zheng Y, Li Y, Shi L, Zhang J, Ma D, You M. An Orphan Gene Enhances Male Reproductive Success in Plutella xylostella. Mol Biol Evol 2024; 41:msae142. [PMID: 38990889 PMCID: PMC11290247 DOI: 10.1093/molbev/msae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 06/28/2024] [Accepted: 07/05/2024] [Indexed: 07/13/2024] Open
Abstract
Plutella xylostella exhibits exceptional reproduction ability, yet the genetic basis underlying the high reproductive capacity remains unknown. Here, we demonstrate that an orphan gene, lushu, which encodes a sperm protein, plays a crucial role in male reproductive success. Lushu is located on the Z chromosome and is prevalent across different P. xylostella populations worldwide. We subsequently generated lushu mutants using transgenic CRISPR/Cas9 system. Knockout of Lushu results in reduced male mating efficiency and accelerated death in adult males. Furthermore, our findings highlight that the deficiency of lushu reduced the transfer of sperms from males to females, potentially resulting in hindered sperm competition. Additionally, the knockout of Lushu results in disrupted gene expression in energy-related pathways and elevated insulin levels in adult males. Our findings reveal that male reproductive performance has evolved through the birth of a newly evolved, lineage-specific gene with enormous potentiality in fecundity success. These insights hold valuable implications for identifying the target for genetic control, particularly in relation to species-specific traits that are pivotal in determining high levels of fecundity.
Collapse
Affiliation(s)
- Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Ministerial and Provincial Joint Innovation Centre for Safety Production of Cross-Strait Crops, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Joint International Research Laboratory of Ecological Pest Control, Ministry of Education, Fuzhou 350002, China
| | - Yahong Zheng
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yiying Li
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Lingping Shi
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Jing Zhang
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Ministerial and Provincial Joint Innovation Centre for Safety Production of Cross-Strait Crops, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Joint International Research Laboratory of Ecological Pest Control, Ministry of Education, Fuzhou 350002, China
| | - Dongna Ma
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Minsheng You
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Ministerial and Provincial Joint Innovation Centre for Safety Production of Cross-Strait Crops, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Joint International Research Laboratory of Ecological Pest Control, Ministry of Education, Fuzhou 350002, China
| |
Collapse
|
6
|
Hughes ES, Tuck LR, He Z, Ballou ER, Wallace EWJ. A trade-off between proliferation and defense in the fungal pathogen Cryptococcus at alkaline pH is controlled by the transcription factor GAT201. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.14.543486. [PMID: 37398450 PMCID: PMC10312749 DOI: 10.1101/2023.06.14.543486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Cryptococcus is a fungal pathogen whose virulence relies on proliferation in and dissemination to host sites, and on synthesis of a defensive yet metabolically costly polysaccharide capsule. Regulatory pathways required for Cryptococcus virulence include a GATA-like transcription factor, Gat201, that regulates Cryptococcal virulence in both capsule-dependent and capsule-independent ways. Here we show that Gat201 is part of a negative regulatory pathway that limits fungal survival at alkaline pH. RNA-seq analysis found strong induction of GAT201 expression within minutes of transfer to RPMI media at alkaline pH. Microscopy, growth curves, and colony forming unit assays show that in RPMI at alkaline pH wild-type Cryptococcus neoformans yeast cells produce capsule but do not bud or maintain viability, while gat201Δ cells make buds and maintain viability, yet fail to produce capsule. GAT201 is required for transcriptional upregulation of a specific set of genes, the majority of which are direct Gat201 targets. Evolutionary analysis shows that Gat201 is in a subfamily of GATA-like transcription factors that is conserved within pathogenic fungi but absent in model yeasts. This work identifies the Gat201 pathway as controlling a trade-off between proliferation and production of defensive capsule. The assays established here will allow characterisation of the mechanisms of action of the Gat201 pathway. Together, our findings urge improved understanding of the regulation of proliferation as a driver of fungal pathogenesis.
Collapse
Affiliation(s)
- Elizabeth S Hughes
- Institute for Cell Biology, and Centre for Engineering Biology, School of Biological Sciences, The University of Edinburgh
| | - Laura R Tuck
- Institute for Cell Biology, and Centre for Engineering Biology, School of Biological Sciences, The University of Edinburgh
| | - Zhenzhen He
- Institute for Cell Biology, and Centre for Engineering Biology, School of Biological Sciences, The University of Edinburgh
| | | | - Edward W J Wallace
- Institute for Cell Biology, and Centre for Engineering Biology, School of Biological Sciences, The University of Edinburgh
| |
Collapse
|
7
|
Chen J, Li Q, Xia S, Arsala D, Sosa D, Wang D, Long M. The Rapid Evolution of De Novo Proteins in Structure and Complex. Genome Biol Evol 2024; 16:evae107. [PMID: 38753069 PMCID: PMC11149777 DOI: 10.1093/gbe/evae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2024] [Indexed: 06/06/2024] Open
Abstract
Recent studies in the rice genome-wide have established that de novo genes, evolving from noncoding sequences, enhance protein diversity through a stepwise process. However, the pattern and rate of their evolution in protein structure over time remain unclear. Here, we addressed these issues within a surprisingly short evolutionary timescale (<1 million years for 97% of Oryza de novo genes) with comparative approaches to gene duplicates. We found that de novo genes evolve faster than gene duplicates in the intrinsically disordered regions (such as random coils), secondary structure elements (such as α helix and β strand), hydrophobicity, and molecular recognition features. In de novo proteins, specifically, we observed an 8% to 14% decay in random coils and intrinsically disordered region lengths and a 2.3% to 6.5% increase in structured elements, hydrophobicity, and molecular recognition features, per million years on average. These patterns of structural evolution align with changes in amino acid composition over time as well. We also revealed higher positive charges but smaller molecular weights for de novo proteins than duplicates. Tertiary structure predictions showed that most de novo proteins, though not typically well folded on their own, readily form low-energy and compact complexes with other proteins facilitated by extensive residue contacts and conformational flexibility, suggesting a faster-binding scenario in de novo proteins to promote interaction. These analyses illuminate a rapid evolution of protein structure in de novo genes in rice genomes, originating from noncoding sequences, highlighting their quick transformation into active, protein complex-forming components within a remarkably short evolutionary timeframe.
Collapse
Affiliation(s)
- Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Qingrong Li
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dong Wang
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
8
|
Lee U, Mozeika SM, Zhao L. A Synergistic, Cultivator Model of De Novo Gene Origination. Genome Biol Evol 2024; 16:evae103. [PMID: 38748819 PMCID: PMC11152449 DOI: 10.1093/gbe/evae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 06/07/2024] Open
Abstract
The origin and fixation of evolutionarily young genes is a fundamental question in evolutionary biology. However, understanding the origins of newly evolved genes arising de novo from noncoding genomic sequences is challenging. This is partly due to the low likelihood that several neutral or nearly neutral mutations fix prior to the appearance of an important novel molecular function. This issue is particularly exacerbated in large effective population sizes where the effect of drift is small. To address this problem, we propose a regulation-focused, cultivator model for de novo gene evolution. This cultivator-focused model posits that each step in a novel variant's evolutionary trajectory is driven by well-defined, selectively advantageous functions for the cultivator genes, rather than solely by the de novo genes, emphasizing the critical role of genome organization in the evolution of new genes.
Collapse
Affiliation(s)
- UnJin Lee
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Shawn M Mozeika
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| |
Collapse
|
9
|
Ilík V, Schwarz EM, Nosková E, Pafčo B. Hookworm genomics: dusk or dawn? Trends Parasitol 2024; 40:452-465. [PMID: 38677925 DOI: 10.1016/j.pt.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/28/2024] [Accepted: 04/04/2024] [Indexed: 04/29/2024]
Abstract
Hookworms are parasites, closely related to the model nematode Caenorhabditis elegans, that are a major economic and health burden worldwide. Primarily three hookworm species (Necator americanus, Ancylostoma duodenale, and Ancylostoma ceylanicum) infect humans. Another 100 hookworm species from 19 genera infect primates, ruminants, and carnivores. Genetic data exist for only seven of these species. Genome sequences are available from only four of these species in two genera, leaving 96 others (particularly those parasitizing wildlife) without any genomic data. The most recent hookworm genomes were published 5 years ago, leaving the field in a dusk. However, assembling genomes from single hookworms may bring a new dawn. Here we summarize advances, challenges, and opportunities for studying these neglected but important parasitic nematodes.
Collapse
Affiliation(s)
- Vladislav Ilík
- Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czech Republic; Department of Botany and Zoology, Faculty of Science, Masaryk University, Brno, Czech Republic.
| | - Erich M Schwarz
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Eva Nosková
- Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czech Republic; Department of Botany and Zoology, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Barbora Pafčo
- Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czech Republic.
| |
Collapse
|
10
|
Ferguson S, Jones A, Murray K, Andrew R, Schwessinger B, Borevitz J. Plant genome evolution in the genus Eucalyptus is driven by structural rearrangements that promote sequence divergence. Genome Res 2024; 34:606-619. [PMID: 38589251 PMCID: PMC11146599 DOI: 10.1101/gr.277999.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 03/22/2024] [Indexed: 04/10/2024]
Abstract
Genomes have a highly organized architecture (nonrandom organization of functional and nonfunctional genetic elements within chromosomes) that is essential for many biological functions, particularly gene expression and reproduction. Despite the need to conserve genome architecture, a high level of structural variation has been observed within species. As species separate and diverge, genome architecture also diverges, becoming increasingly poorly conserved as divergence time increases. However, within plant genomes, the processes of genome architecture divergence are not well described. Here we use long-read sequencing and de novo assembly of 33 phylogenetically diverse, wild and naturally evolving Eucalyptus species, covering 1-50 million years of diverging genome evolution to measure genome architectural conservation and describe architectural divergence. The investigation of these genomes revealed that following lineage divergence, genome architecture is highly fragmented by rearrangements. As genomes continue to diverge, the accumulation of mutations and the subsequent divergence beyond recognition of rearrangements become the primary driver of genome divergence. The loss of syntenic regions also contribute to genome divergence but at a slower pace than that of rearrangements. We hypothesize that duplications and translocations are potentially the greatest contributors to Eucalyptus genome divergence.
Collapse
Affiliation(s)
- Scott Ferguson
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, 2601, Australia;
| | - Ashley Jones
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, 2601, Australia;
| | - Kevin Murray
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, 2601, Australia
- Weigel Department, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Rose Andrew
- Botany & N.C.W. Beadle Herbarium, School of Environmental and Rural Science, University of New England, Armidale, New South Wales 2351, Australia
| | - Benjamin Schwessinger
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, 2601, Australia
| | - Justin Borevitz
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, 2601, Australia
| |
Collapse
|
11
|
Wright E. Accurately clustering biological sequences in linear time by relatedness sorting. Nat Commun 2024; 15:3047. [PMID: 38589369 PMCID: PMC11001989 DOI: 10.1038/s41467-024-47371-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open
Abstract
Clustering biological sequences into similar groups is an increasingly important task as the number of available sequences continues to grow exponentially. Search-based approaches to clustering scale super-linearly with the number of input sequences, making it impractical to cluster very large sets of sequences. Approaches to clustering sequences in linear time currently lack the accuracy of super-linear approaches. Here, I set out to develop and characterize a strategy for clustering with linear time complexity that retains the accuracy of less scalable approaches. The resulting algorithm, named Clusterize, sorts sequences by relatedness to linearize the clustering problem. Clusterize produces clusters with accuracy rivaling popular programs (CD-HIT, MMseqs2, and UCLUST) but exhibits linear asymptotic scalability. Clusterize generates higher accuracy and oftentimes much larger clusters than Linclust, a fast linear time clustering algorithm. I demonstrate the utility of Clusterize by accurately solving different clustering problems involving millions of nucleotide or protein sequences.
Collapse
Affiliation(s)
- Erik Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
12
|
Kashlan OB, Wang XP, Sheng S, Kleyman TR. Epithelial Na + Channels Function as Extracellular Sensors. Compr Physiol 2024; 14:1-41. [PMID: 39109974 PMCID: PMC11309579 DOI: 10.1002/cphy.c230015] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
The epithelial Na + channel (ENaC) resides on the apical surfaces of specific epithelia in vertebrates and plays a critical role in extracellular fluid homeostasis. Evidence that ENaC senses the external environment emerged well before the molecular identity of the channel was reported three decades ago. This article discusses progress toward elucidating the mechanisms through which specific external factors regulate ENaC function, highlighting insights gained from structural studies of ENaC and related family members. It also reviews our understanding of the role of ENaC regulation by the extracellular environment in physiology and disease. After familiarizing the reader with the channel's physiological roles and structure, we describe the central role protein allostery plays in ENaC's sensitivity to the external environment. We then discuss each of the extracellular factors that directly regulate the channel: proteases, cations and anions, shear stress, and other regulators specific to particular extracellular compartments. For each regulator, we discuss the initial observations that led to discovery, studies investigating molecular mechanism, and the physiological and pathophysiological implications of regulation. © 2024 American Physiological Society. Compr Physiol 14:5407-5447, 2024.
Collapse
Affiliation(s)
- Ossama B. Kashlan
- Department of Medicine, Renal-Electrolyte Division,
University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Computational and Systems Biology, University
of Pittsburgh, Pittsburgh, Pennsylvania
| | - Xue-Ping Wang
- Department of Medicine, Renal-Electrolyte Division,
University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Shaohu Sheng
- Department of Medicine, Renal-Electrolyte Division,
University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Thomas R. Kleyman
- Department of Medicine, Renal-Electrolyte Division,
University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Cell Biology, University of Pittsburgh,
Pittsburgh, Pennsylvania
- Department of Pharmacology and Chemical Biology, University
of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
13
|
Hannon Bozorgmehr J. Four classic "de novo" genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences. Mol Genet Genomics 2024; 299:6. [PMID: 38315248 DOI: 10.1007/s00438-023-02090-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 10/15/2023] [Indexed: 02/07/2024]
Abstract
Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-coding genes can emerge from non-coding sequences has gradually become accepted over the past two decades. Examples of "de novo origination", resulting in lineage-specific "orphan" genes, lacking coding orthologs, are now produced every year. However, many are likely cases of duplicates that are difficult to recognize. Here, I re-examine the claims and show that four very well-known examples of genes alleged to have emerged completely "from scratch"- FLJ33706 in humans, Goddard in fruit flies, BSC4 in baker's yeast and AFGP2 in codfish-may have plausible evolutionary ancestors in pre-existing genes. The first two are likely highly diverged retrogenes coding for regulatory proteins that have been misidentified as orphans. The antifreeze glycoprotein, moreover, may not have evolved from repetitive non-genic sequences but, as in several other related cases, from an apolipoprotein that could have become pseudogenized before later being reactivated. These findings detract from various claims made about de novo gene birth and show there has been a tendency not to invest the necessary effort in searching for homologs outside of a very limited syntenic or phylostratigraphic methodology. A robust approach is used for improving detection that draws upon similarities, not just in terms of statistical sequence analysis, but also relating to biochemistry and function, to obviate notable failures to identify homologs.
Collapse
|
14
|
O'Meara MJ, Rapala JR, Nichols CB, Alexandre AC, Billmyre RB, Steenwyk JL, Alspaugh JA, O'Meara TR. CryptoCEN: A Co-Expression Network for Cryptococcus neoformans reveals novel proteins involved in DNA damage repair. PLoS Genet 2024; 20:e1011158. [PMID: 38359090 PMCID: PMC10901339 DOI: 10.1371/journal.pgen.1011158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 02/28/2024] [Accepted: 01/30/2024] [Indexed: 02/17/2024] Open
Abstract
Elucidating gene function is a major goal in biology, especially among non-model organisms. However, doing so is complicated by the fact that molecular conservation does not always mirror functional conservation, and that complex relationships among genes are responsible for encoding pathways and higher-order biological processes. Co-expression, a promising approach for predicting gene function, relies on the general principal that genes with similar expression patterns across multiple conditions will likely be involved in the same biological process. For Cryptococcus neoformans, a prevalent human fungal pathogen greatly diverged from model yeasts, approximately 60% of the predicted genes in the genome lack functional annotations. Here, we leveraged a large amount of publicly available transcriptomic data to generate a C. neoformans Co-Expression Network (CryptoCEN), successfully recapitulating known protein networks, predicting gene function, and enabling insights into the principles influencing co-expression. With 100% predictive accuracy, we used CryptoCEN to identify 13 new DNA damage response genes, underscoring the utility of guilt-by-association for determining gene function. Overall, co-expression is a powerful tool for uncovering gene function, and decreases the experimental tests needed to identify functions for currently under-annotated genes.
Collapse
Affiliation(s)
- Matthew J O'Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jackson R Rapala
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Connie B Nichols
- Departments of Medicine and Molecular Genetics/Microbiology; and Cell Biology, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - A Christina Alexandre
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - R Blake Billmyre
- Departments of Pharmaceutical and Biomedical Sciences/Infectious Disease, College of Pharmacy/College of Veterinary Medicine, University of Georgia, Athens, Georgia, United States of America
| | - Jacob L Steenwyk
- Howard Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - J Andrew Alspaugh
- Departments of Medicine and Molecular Genetics/Microbiology; and Cell Biology, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Teresa R O'Meara
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
15
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun 2024; 15:810. [PMID: 38280868 PMCID: PMC10821953 DOI: 10.1038/s41467-024-45028-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/09/2024] [Indexed: 01/29/2024] Open
Abstract
Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation. These young genes provide an opportunity to study the structural and functional origins of proteins. Here, we combine high-quality base-level whole-genome alignments and computational structural modeling to study the origination, evolution, and protein structures of lineage-specific de novo genes. We identify 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. Sequence composition, evolutionary rates, and expression patterns indicate possible gradual functional or adaptive shifts with their gene ages. Surprisingly, we find little overall protein structural changes in candidates from the Drosophilinae lineage. We identify several candidates with potentially well-folded protein structures. Ancestral sequence reconstruction analysis reveals that most potentially well-folded candidates are often born well-folded. Single-cell RNA-seq analysis in testis shows that although most de novo gene candidates are enriched in spermatocytes, several young candidates are biased towards the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and protein structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
16
|
Maurer-Alcalá XX, Cote-L’Heureux A, Kosakovsky Pond SL, Katz LA. Somatic genome architecture and molecular evolution are decoupled in "young" linage-specific gene families in ciliates. PLoS One 2024; 19:e0291688. [PMID: 38271450 PMCID: PMC10810533 DOI: 10.1371/journal.pone.0291688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 09/02/2023] [Indexed: 01/27/2024] Open
Abstract
The evolution of lineage-specific gene families remains poorly studied across the eukaryotic tree of life, with most analyses focusing on the recent evolution of de novo genes in model species. Here we explore the origins of lineage-specific genes in ciliates, a ~1 billion year old clade of microeukaryotes that are defined by their division of somatic and germline functions into distinct nuclei. Previous analyses on conserved gene families have shown the effect of ciliates' unusual genome architecture on gene family evolution: extensive genome processing-the generation of thousands of gene-sized somatic chromosomes from canonical germline chromosomes-is associated with larger and more diverse gene families. To further study the relationship between ciliate genome architecture and gene family evolution, we analyzed lineage specific gene families from a set of 46 transcriptomes and 12 genomes representing x species from eight ciliate classes. We assess how the evolution lineage-specific gene families occurs among four groups of ciliates: extensive fragmenters with gene-size somatic chromosomes, non-extensive fragmenters with "large'' multi-gene somatic chromosomes, Heterotrichea with highly polyploid somatic genomes and Karyorelictea with 'paradiploid' somatic genomes. Our analyses demonstrate that: 1) most lineage-specific gene families are found at shallow taxonomic scales; 2) extensive genome processing (i.e., gene unscrambling) during development likely influences the size and number of young lineage-specific gene families; and 3) the influence of somatic genome architecture on molecular evolution is increasingly apparent in older gene families. Altogether, these data highlight the influences of genome architecture on the evolution of lineage-specific gene families in eukaryotes.
Collapse
Affiliation(s)
- Xyrus X. Maurer-Alcalá
- Institute of Cell Biology, University of Bern, Bern, Switzerland
- Department of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
| | - Auden Cote-L’Heureux
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, United States of America
| | - Sergei L. Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Laura A. Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, United States of America
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| |
Collapse
|
17
|
Goodheart JA, Rio RA, Taraporevala NF, Fiorenza RA, Barnes SR, Morrill K, Jacob MAC, Whitesel C, Masterson P, Batzel GO, Johnston HT, Ramirez MD, Katz PS, Lyons DC. A chromosome-level genome for the nudibranch gastropod Berghia stephanieae helps parse clade-specific gene expression in novel and conserved phenotypes. BMC Biol 2024; 22:9. [PMID: 38233809 PMCID: PMC10795318 DOI: 10.1186/s12915-024-01814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 01/03/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND How novel phenotypes originate from conserved genes, processes, and tissues remains a major question in biology. Research that sets out to answer this question often focuses on the conserved genes and processes involved, an approach that explicitly excludes the impact of genetic elements that may be classified as clade-specific, even though many of these genes are known to be important for many novel, or clade-restricted, phenotypes. This is especially true for understudied phyla such as mollusks, where limited genomic and functional biology resources for members of this phylum have long hindered assessments of genetic homology and function. To address this gap, we constructed a chromosome-level genome for the gastropod Berghia stephanieae (Valdés, 2005) to investigate the expression of clade-specific genes across both novel and conserved tissue types in this species. RESULTS The final assembled and filtered Berghia genome is comparable to other high-quality mollusk genomes in terms of size (1.05 Gb) and number of predicted genes (24,960 genes) and is highly contiguous. The proportion of upregulated, clade-specific genes varied across tissues, but with no clear trend between the proportion of clade-specific genes and the novelty of the tissue. However, more complex tissue like the brain had the highest total number of upregulated, clade-specific genes, though the ratio of upregulated clade-specific genes to the total number of upregulated genes was low. CONCLUSIONS Our results, when combined with previous research on the impact of novel genes on phenotypic evolution, highlight the fact that the complexity of the novel tissue or behavior, the type of novelty, and the developmental timing of evolutionary modifications will all influence how novel and conserved genes interact to generate diversity.
Collapse
Affiliation(s)
- Jessica A Goodheart
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA.
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA.
| | - Robin A Rio
- Bioengineering Department, Stanford University, Stanford, CA, USA
| | - Neville F Taraporevala
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
- Department of Wildland Resources, Utah State University, Logan, UT, USA
| | - Rose A Fiorenza
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Seth R Barnes
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Kevin Morrill
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Mark Allan C Jacob
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Carl Whitesel
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Park Masterson
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Grant O Batzel
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hereroa T Johnston
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - M Desmond Ramirez
- Department of Biology, University of Massachusetts Amherst, Amherst, MA, USA
- Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | - Paul S Katz
- Department of Biology, University of Massachusetts Amherst, Amherst, MA, USA
| | - Deirdre C Lyons
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
18
|
Svedberg D, Winiger RR, Berg A, Sharma H, Tellgren-Roth C, Debrunner-Vossbrinck BA, Vossbrinck CR, Barandun J. Functional annotation of a divergent genome using sequence and structure-based similarity. BMC Genomics 2024; 25:6. [PMID: 38166563 PMCID: PMC10759460 DOI: 10.1186/s12864-023-09924-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 12/18/2023] [Indexed: 01/04/2024] Open
Abstract
BACKGROUND Microsporidia are a large taxon of intracellular pathogens characterized by extraordinarily streamlined genomes with unusually high sequence divergence and many species-specific adaptations. These unique factors pose challenges for traditional genome annotation methods based on sequence similarity. As a result, many of the microsporidian genomes sequenced to date contain numerous genes of unknown function. Recent innovations in rapid and accurate structure prediction and comparison, together with the growing amount of data in structural databases, provide new opportunities to assist in the functional annotation of newly sequenced genomes. RESULTS In this study, we established a workflow that combines sequence and structure-based functional gene annotation approaches employing a ChimeraX plugin named ANNOTEX (Annotation Extension for ChimeraX), allowing for visual inspection and manual curation. We employed this workflow on a high-quality telomere-to-telomere sequenced tetraploid genome of Vairimorpha necatrix. First, the 3080 predicted protein-coding DNA sequences, of which 89% were confirmed with RNA sequencing data, were used as input. Next, ColabFold was used to create protein structure predictions, followed by a Foldseek search for structural matching to the PDB and AlphaFold databases. The subsequent manual curation, using sequence and structure-based hits, increased the accuracy and quality of the functional genome annotation compared to results using only traditional annotation tools. Our workflow resulted in a comprehensive description of the V. necatrix genome, along with a structural summary of the most prevalent protein groups, such as the ricin B lectin family. In addition, and to test our tool, we identified the functions of several previously uncharacterized Encephalitozoon cuniculi genes. CONCLUSION We provide a new functional annotation tool for divergent organisms and employ it on a newly sequenced, high-quality microsporidian genome to shed light on this uncharacterized intracellular pathogen of Lepidoptera. The addition of a structure-based annotation approach can serve as a valuable template for studying other microsporidian or similarly divergent species.
Collapse
Affiliation(s)
- Dennis Svedberg
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Rahel R Winiger
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
| | - Alexandra Berg
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Himanshu Sharma
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Christian Tellgren-Roth
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | | | - Charles R Vossbrinck
- Department of Environmental Science, Connecticut Agricultural Experiment Station, New Haven, CT, 06504, USA
| | - Jonas Barandun
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden.
| |
Collapse
|
19
|
Novikova PV, Bhanu Busi S, Probst AJ, May P, Wilmes P. Functional prediction of proteins from the human gut archaeome. ISME COMMUNICATIONS 2024; 4:ycad014. [PMID: 38486809 PMCID: PMC10939349 DOI: 10.1093/ismeco/ycad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 12/16/2023] [Accepted: 12/19/2023] [Indexed: 03/17/2024]
Abstract
The human gastrointestinal tract contains diverse microbial communities, including archaea. Among them, Methanobrevibacter smithii represents a highly active and clinically relevant methanogenic archaeon, being involved in gastrointestinal disorders, such as inflammatory bowel disease and obesity. Herein, we present an integrated approach using sequence and structure information to improve the annotation of M. smithii proteins using advanced protein structure prediction and annotation tools, such as AlphaFold2, trRosetta, ProFunc, and DeepFri. Of an initial set of 873 481 archaeal proteins, we found 707 754 proteins exclusively present in the human gut. Having analysed archaeal proteins together with 87 282 994 bacterial proteins, we identified unique archaeal proteins and archaeal-bacterial homologs. We then predicted and characterized functional domains and structures of 73 unique and homologous archaeal protein clusters linked the human gut and M. smithii. We refined annotations based on the predicted structures, extending existing sequence similarity-based annotations. We identified gut-specific archaeal proteins that may be involved in defense mechanisms, virulence, adhesion, and the degradation of toxic substances. Interestingly, we identified potential glycosyltransferases that could be associated with N-linked and O-glycosylation. Additionally, we found preliminary evidence for interdomain horizontal gene transfer between Clostridia species and M. smithii, which includes sporulation Stage V proteins AE and AD. Our study broadens the understanding of archaeal biology, particularly M. smithii, and highlights the importance of considering both sequence and structure for the prediction of protein function.
Collapse
Affiliation(s)
- Polina V Novikova
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| | - Susheel Bhanu Busi
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
- UK Centre for Ecology and Hydrology, Wallingford, OX10 8 BB, United Kingdom
| | - Alexander J Probst
- Environmental Metagenomics, Department of Chemistry, Research Center One Health Ruhr of the University Alliance Ruhr, for Environmental Microbiology and Biotechnology, University Duisburg-Essen, Duisburg 47057, Germany
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| | - Paul Wilmes
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| |
Collapse
|
20
|
Kafri M, Patena W, Martin L, Wang L, Gomer G, Ergun SL, Sirkejyan AK, Goh A, Wilson AT, Gavrilenko SE, Breker M, Roichman A, McWhite CD, Rabinowitz JD, Cross FR, Wühr M, Jonikas MC. Systematic identification and characterization of genes in the regulation and biogenesis of photosynthetic machinery. Cell 2023; 186:5638-5655.e25. [PMID: 38065083 PMCID: PMC10760936 DOI: 10.1016/j.cell.2023.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 08/03/2023] [Accepted: 11/03/2023] [Indexed: 12/18/2023]
Abstract
Photosynthesis is central to food production and the Earth's biogeochemistry, yet the molecular basis for its regulation remains poorly understood. Here, using high-throughput genetics in the model eukaryotic alga Chlamydomonas reinhardtii, we identify with high confidence (false discovery rate [FDR] < 0.11) 70 poorly characterized genes required for photosynthesis. We then enable the functional characterization of these genes by providing a resource of proteomes of mutant strains, each lacking one of these genes. The data allow assignment of 34 genes to the biogenesis or regulation of one or more specific photosynthetic complexes. Further analysis uncovers biogenesis/regulatory roles for at least seven proteins, including five photosystem I mRNA maturation factors, the chloroplast translation factor MTF1, and the master regulator PMR1, which regulates chloroplast genes via nuclear-expressed factors. Our work provides a rich resource identifying regulatory and functional genes and placing them into pathways, thereby opening the door to a system-level understanding of photosynthesis.
Collapse
Affiliation(s)
- Moshe Kafri
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Weronika Patena
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Lance Martin
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics and Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Lianyong Wang
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Gillian Gomer
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Sabrina L Ergun
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA; Howard Hughes Medical Institute, Princeton University, Princeton, NJ 08544, USA
| | - Arthur K Sirkejyan
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Audrey Goh
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Alexandra T Wilson
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Sophia E Gavrilenko
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Michal Breker
- Laboratory of Cell Cycle Genetics, The Rockefeller University, New York, NY 10021, USA
| | - Asael Roichman
- Lewis-Sigler Institute for Integrative Genomics and Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Claire D McWhite
- Lewis-Sigler Institute for Integrative Genomics and Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Joshua D Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics and Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Frederick R Cross
- Laboratory of Cell Cycle Genetics, The Rockefeller University, New York, NY 10021, USA
| | - Martin Wühr
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics and Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Martin C Jonikas
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA; Howard Hughes Medical Institute, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
21
|
Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phylogenomics era. Nat Rev Genet 2023; 24:834-850. [PMID: 37369847 DOI: 10.1038/s41576-023-00620-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/29/2023]
Abstract
Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Xing-Xing Shen
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA.
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
| |
Collapse
|
22
|
Frumkin I, Laub MT. Selection of a de novo gene that can promote survival of Escherichia coli by modulating protein homeostasis pathways. Nat Ecol Evol 2023; 7:2067-2079. [PMID: 37945946 PMCID: PMC10697842 DOI: 10.1038/s41559-023-02224-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 09/12/2023] [Indexed: 11/12/2023]
Abstract
Cellular novelty can emerge when non-functional loci become functional genes in a process termed de novo gene birth. But how proteins with random amino acid sequences beneficially integrate into existing cellular pathways remains poorly understood. We screened ~108 genes, generated from random nucleotide sequences and devoid of homology to natural genes, for their ability to rescue growth arrest of Escherichia coli cells producing the ribonuclease toxin MazF. We identified ~2,000 genes that could promote growth, probably by reducing transcription from the promoter driving toxin expression. Additionally, one random protein, named Random antitoxin of MazF (RamF), modulated protein homeostasis by interacting with chaperones, leading to MazF proteolysis and a consequent loss of its toxicity. Finally, we demonstrate that random proteins can improve during evolution by identifying beneficial mutations that turned RamF into a more efficient inhibitor. Our work provides a mechanistic basis for how de novo gene birth can produce functional proteins that effectively benefit cells evolving under stress.
Collapse
Affiliation(s)
- Idan Frumkin
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Cambridge, MA, USA.
| |
Collapse
|
23
|
Hamamsy T, Barot M, Morton JT, Steinegger M, Bonneau R, Cho K. Learning sequence, structure, and function representations of proteins with language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.26.568742. [PMID: 38045331 PMCID: PMC10690258 DOI: 10.1101/2023.11.26.568742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The sequence-structure-function relationships that ultimately generate the diversity of extant observed proteins is complex, as proteins bridge the gap between multiple informational and physical scales involved in nearly all cellular processes. One limitation of existing protein annotation databases such as UniProt is that less than 1% of proteins have experimentally verified functions, and computational methods are needed to fill in the missing information. Here, we demonstrate that a multi-aspect framework based on protein language models can learn sequence-structure-function representations of amino acid sequences, and can provide the foundation for sensitive sequence-structure-function aware protein sequence search and annotation. Based on this model, we introduce a multi-aspect information retrieval system for proteins, Protein-Vec, covering sequence, structure, and function aspects, that enables computational protein annotation and function prediction at tree-of-life scales.
Collapse
|
24
|
Goodheart JA, Rio RA, Taraporevala NF, Fiorenza RA, Barnes SR, Morrill K, Jacob MAC, Whitesel C, Masterson P, Batzel GO, Johnston HT, Ramirez MD, Katz PS, Lyons DC. A chromosome-level genome for the nudibranch gastropod Berghia stephanieae helps parse clade-specific gene expression in novel and conserved phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.04.552006. [PMID: 38014205 PMCID: PMC10680569 DOI: 10.1101/2023.08.04.552006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
How novel phenotypes originate from conserved genes, processes, and tissues remains a major question in biology. Research that sets out to answer this question often focuses on the conserved genes and processes involved, an approach that explicitly excludes the impact of genetic elements that may be classified as clade-specific, even though many of these genes are known to be important for many novel, or clade-restricted, phenotypes. This is especially true for understudied phyla such as mollusks, where limited genomic and functional biology resources for members of this phylum has long hindered assessments of genetic homology and function. To address this gap, we constructed a chromosome-level genome for the gastropod Berghia stephanieae (Valdés, 2005) to investigate the expression of clade-specific genes across both novel and conserved tissue types in this species. The final assembled and filtered Berghia genome is comparable to other high quality mollusk genomes in terms of size (1.05 Gb) and number of predicted genes (24,960 genes), and is highly contiguous. The proportion of upregulated, clade-specific genes varied across tissues, but with no clear trend between the proportion of clade-specific genes and the novelty of the tissue. However, more complex tissue like the brain had the highest total number of upregulated, clade-specific genes, though the ratio of upregulated clade-specific genes to the total number of upregulated genes was low. Our results, when combined with previous research on the impact of novel genes on phenotypic evolution, highlight the fact that the complexity of the novel tissue or behavior, the type of novelty, and the developmental timing of evolutionary modifications will all influence how novel and conserved genes interact to generate diversity.
Collapse
Affiliation(s)
- Jessica A. Goodheart
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY USA
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Robin A. Rio
- Bioengineering Department, Stanford University, Stanford, CA, USA
| | - Neville F. Taraporevala
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
- Department of Wildland Resources, Utah State University, Logan, UT, USA
| | - Rose A. Fiorenza
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Seth R. Barnes
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Kevin Morrill
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Mark Allan C. Jacob
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Carl Whitesel
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Park Masterson
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Grant O. Batzel
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hereroa T. Johnston
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - M. Desmond Ramirez
- Department of Biology, University of Massachusetts Amherst, Amherst, MA, USA
| | - Paul S. Katz
- Department of Biology, University of Massachusetts Amherst, Amherst, MA, USA
| | - Deirdre C. Lyons
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
25
|
Wang Z, Wang YW, Kasuga T, Lopez-Giraldez F, Zhang Y, Zhang Z, Wang Y, Dong C, Sil A, Trail F, Yarden O, Townsend JP. Lineage-specific genes are clustered with HET-domain genes and respond to environmental and genetic manipulations regulating reproduction in Neurospora. PLoS Genet 2023; 19:e1011019. [PMID: 37934795 PMCID: PMC10684091 DOI: 10.1371/journal.pgen.1011019] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 11/28/2023] [Accepted: 10/16/2023] [Indexed: 11/09/2023] Open
Abstract
Lineage-specific genes (LSGs) have long been postulated to play roles in the establishment of genetic barriers to intercrossing and speciation. In the genome of Neurospora crassa, most of the 670 Neurospora LSGs that are aggregated adjacent to the telomeres are clustered with 61% of the HET-domain genes, some of which regulate self-recognition and define vegetative incompatibility groups. In contrast, the LSG-encoding proteins possess few to no domains that would help to identify potential functional roles. Possible functional roles of LSGs were further assessed by performing transcriptomic profiling in genetic mutants and in response to environmental alterations, as well as examining gene knockouts for phenotypes. Among the 342 LSGs that are dynamically expressed during both asexual and sexual phases, 64% were detectable on unusual carbon sources such as furfural, a wildfire-produced chemical that is a strong inducer of sexual development, and the structurally-related furan 5-hydroxymethyl furfural (HMF). Expression of a significant portion of the LSGs was sensitive to light and temperature, factors that also regulate the switch from asexual to sexual reproduction. Furthermore, expression of the LSGs was significantly affected in the knockouts of adv-1 and pp-1 that regulate hyphal communication, and expression of more than one quarter of the LSGs was affected by perturbation of the mating locus. These observations encouraged further investigation of the roles of clustered lineage-specific and HET-domain genes in ecology and reproduction regulation in Neurospora, especially the regulation of the switch from the asexual growth to sexual reproduction, in response to dramatic environmental conditions changes.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Takao Kasuga
- College of Biological Sciences, University of California, Davis, California, United States of America
| | | | - Yang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Yaning Wang
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Anita Sil
- Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Frances Trail
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
26
|
Gunasekera RS, Raja KKB, Hewapathirana S, Tundrea E, Gunasekera V, Galbadage T, Nelson PA. ORFanID: A web-based search engine for the discovery and identification of orphan and taxonomically restricted genes. PLoS One 2023; 18:e0291260. [PMID: 37879070 PMCID: PMC10599687 DOI: 10.1371/journal.pone.0291260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 08/24/2023] [Indexed: 10/27/2023] Open
Abstract
With the numerous genomes sequenced today, it has been revealed that a noteworthy percentage of genes in a given taxon of organisms in the phylogenetic tree of life do not have orthologous sequences in other taxa. These sequences are commonly referred to as "orphans" or "ORFans" if found as single occurrences in a single species or as "taxonomically restricted genes" (TRGs) when found at higher taxonomic levels. Quantitative and collective studies of these genes are necessary for understanding their biological origins. However, the current software for identifying orphan genes is limited in its functionality, database search range, and very complex algorithmically. Thus, researchers studying orphan genes must harvest their data from many disparate sources. ORFanID is a graphical web-based search engine that facilitates the efficient identification of both orphan genes and TRGs at all taxonomic levels, from DNA or amino acid sequences in the NCBI database cluster and other large bioinformatics repositories. The software allows users to identify genes that are unique to any taxonomic rank, from species to domain, using NCBI systematic classifiers. It provides control over NCBI database search parameters, and the results are presented in a spreadsheet as well as a graphical display. The tables in the software are sortable, and results can be filtered using the fuzzy search functionality. The visual presentation can be expanded and collapsed by the taxonomic tree to its various branches. Example results from searches on five species and gene expression data from specific orphan genes are provided in the Supplementary Information.
Collapse
Affiliation(s)
- Richard S. Gunasekera
- Department of Chemistry, Physics and Engineering, School of Science, Technology & Health, Biola University, La Mirada, CA, United States of America
| | - Komal K. B. Raja
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, United States of America
| | - Suresh Hewapathirana
- European Bioinformatics Institute, Welcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Emanuel Tundrea
- Griffiths School of Management and IT, Emanuel University of Oradea, Oradea, Romania
| | - Vinodh Gunasekera
- Bioinformatics, Chesalon USA, Inc., Houston, TX, United States of America
| | - Thushara Galbadage
- Department of Kinesiology and Public Health, School of Science, Technology & Health, Biola University, La Mirada, CA, United States of America
| | - Paul A. Nelson
- Biola University, La Mirada, CA, United States of America
| |
Collapse
|
27
|
Wang Z, Wang YW, Kasuga T, Hassler H, Lopez-Giraldez F, Dong C, Yarden O, Townsend JP. Origins of lineage-specific elements via gene duplication, relocation, and regional rearrangement in Neurospora crassa. Mol Ecol 2023. [PMID: 37843462 DOI: 10.1111/mec.17168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/20/2023] [Accepted: 09/27/2023] [Indexed: 10/17/2023]
Abstract
The origin of new genes has long been a central interest of evolutionary biologists. However, their novelty means that they evade reconstruction by the classical tools of evolutionary modelling. This evasion of deep ancestral investigation necessitates intensive study of model species within well-sampled, recently diversified, clades. One such clade is the model genus Neurospora, members of which lack recent gene duplications. Several Neurospora species are comprehensively characterized organisms apt for studying the evolution of lineage-specific genes (LSGs). Using gene synteny, we documented that 78% of Neurospora LSG clusters are located adjacent to the telomeres featuring extensive tracts of non-coding DNA and duplicated genes. Here, we report several instances of LSGs that are likely from regional rearrangements and potentially from gene rebirth. To broadly investigate the functions of LSGs, we assembled transcriptomics data from 68 experimental data points and identified co-regulatory modules using Weighted Gene Correlation Network Analysis, revealing that LSGs are widely but peripherally involved in known regulatory machinery for diverse functions. The ancestral status of the LSG mas-1, a gene with roles in cell-wall integrity and cellular sensitivity to antifungal toxins, was investigated in detail alongside its genomic neighbours, indicating that it arose from an ancient lysophospholipase precursor that is ubiquitous in lineages of the Sordariomycetes. Our discoveries illuminate a "rummage region" in the N. crassa genome that enables the formation of new genes and functions to arise via gene duplication and relocation, followed by fast mutation and recombination facilitated by sequence repeats and unconstrained non-coding sequences.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Takao Kasuga
- College of Biological Sciences, University of California, Davis, Davis, California, USA
| | - Hayley Hassler
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | | | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
28
|
Jia K, Kilinc M, Jernigan RL. New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions. FRONTIERS IN BIOINFORMATICS 2023; 3:1227193. [PMID: 37900964 PMCID: PMC10602800 DOI: 10.3389/fbinf.2023.1227193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 10/31/2023] Open
Abstract
Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.
Collapse
Affiliation(s)
- Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
| | - Mesih Kilinc
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Robert L. Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
29
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
30
|
Wiberg RAW, Brand JN, Viktorin G, Mitchell JO, Beisel C, Schärer L. Genome assemblies of the simultaneously hermaphroditic flatworms Macrostomum cliftonense and Macrostomum hystrix. G3 (BETHESDA, MD.) 2023; 13:jkad149. [PMID: 37398989 PMCID: PMC10468722 DOI: 10.1093/g3journal/jkad149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023]
Abstract
The free-living, simultaneously hermaphroditic flatworms of the genus Macrostomum are increasingly used as model systems in various contexts. In particular, Macrostomum lignano, the only species of this group with a published genome assembly, has emerged as a model for the study of regeneration, reproduction, and stem-cell function. However, challenges have emerged due to M. lignano being a hidden polyploid, having recently undergone whole-genome duplication and chromosome fusion events. This complex genome architecture presents a significant roadblock to the application of many modern genetic tools. Hence, additional genomic resources for this genus are needed. Here, we present such resources for Macrostomum cliftonense and Macrostomum hystrix, which represent the contrasting mating behaviors of reciprocal copulation and hypodermic insemination found in the genus. We use a combination of PacBio long-read sequencing and Illumina shot-gun sequencing, along with several RNA-Seq data sets, to assemble and annotate highly contiguous genomes for both species. The assemblies span ∼227 and ∼220 Mb and are represented by 399 and 42 contigs for M. cliftonense and M. hystrix, respectively. Furthermore, high BUSCO completeness (∼84-85%), low BUSCO duplication rates (8.3-6.2%), and low k-mer multiplicity indicate that these assemblies do not suffer from the same assembly ambiguities of the M. lignano genome assembly, which can be attributed to the complex karyology of this species. We also show that these resources, in combination with the prior resources from M. lignano, offer an excellent foundation for comparative genomic research in this group of organisms.
Collapse
Affiliation(s)
- R Axel W Wiberg
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Jeremias N Brand
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Gudrun Viktorin
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Jack O Mitchell
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| | - Christian Beisel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland
| | - Lukas Schärer
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel 4051, Switzerland
| |
Collapse
|
31
|
O’Meara MJ, Rapala JR, Nichols CB, Alexandre C, Billmyre RB, Steenwyk JL, Alspaugh JA, O’Meara TR. CryptoCEN: A Co-Expression Network for Cryptococcus neoformans reveals novel proteins involved in DNA damage repair. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.17.553567. [PMID: 37645941 PMCID: PMC10462067 DOI: 10.1101/2023.08.17.553567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Elucidating gene function is a major goal in biology, especially among non-model organisms. However, doing so is complicated by the fact that molecular conservation does not always mirror functional conservation, and that complex relationships among genes are responsible for encoding pathways and higher-order biological processes. Co-expression, a promising approach for predicting gene function, relies on the general principal that genes with similar expression patterns across multiple conditions will likely be involved in the same biological process. For Cryptococcus neoformans, a prevalent human fungal pathogen greatly diverged from model yeasts, approximately 60% of the predicted genes in the genome lack functional annotations. Here, we leveraged a large amount of publicly available transcriptomic data to generate a C. neoformans Co-Expression Network (CryptoCEN), successfully recapitulating known protein networks, predicting gene function, and enabling insights into the principles influencing co-expression. With 100% predictive accuracy, we used CryptoCEN to identify 13 new DNA damage response genes, underscoring the utility of guilt-by-association for determining gene function. Overall, co-expression is a powerful tool for uncovering gene function, and decreases the experimental tests needed to identify functions for currently under-annotated genes.
Collapse
Affiliation(s)
- Matthew J. O’Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jackson R. Rapala
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Connie B. Nichols
- Departments of Medicine and Molecular Genetics/Microbiology; and Cell Biology, Duke University School of Medicine, Durham, North Carolina, USA
| | - Christina Alexandre
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - R. Blake Billmyre
- Departments of Pharmaceutical and Biomedical Sciences/Infectious Disease, College of Pharmacy/College of Veterinary Medicine, University of Georgia, Athens, Georgia, USA
| | - Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - J. Andrew Alspaugh
- Departments of Medicine and Molecular Genetics/Microbiology; and Cell Biology, Duke University School of Medicine, Durham, North Carolina, USA
| | - Teresa R. O’Meara
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
32
|
Kandathil SM, Lau AM, Jones DT. Machine learning methods for predicting protein structure from single sequences. Curr Opin Struct Biol 2023; 81:102627. [PMID: 37320955 DOI: 10.1016/j.sbi.2023.102627] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/17/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023]
Abstract
Recent breakthroughs in protein structure prediction have increasingly relied on the use of deep neural networks. These recent methods are notable in that they produce 3-D atomic coordinates as a direct output of the networks, a feature which presents many advantages. Although most techniques of this type make use of multiple sequence alignments as their primary input, a new wave of methods have attempted to use just single sequences as the input. We discuss the make-up and operating principles of these models, and highlight new developments in these areas, as well as areas for future development.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Andy M Lau
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
33
|
Athanasouli M, Akduman N, Röseler W, Theam P, Rödelsperger C. Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota. PLoS Genet 2023; 19:e1010832. [PMID: 37399201 DOI: 10.1371/journal.pgen.1010832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/15/2023] [Indexed: 07/05/2023] Open
Abstract
Adaptation of organisms to environmental change may be facilitated by the creation of new genes. New genes without homologs in other lineages are known as taxonomically-restricted orphan genes and may result from divergence or de novo formation. Previously, we have extensively characterized the evolution and origin of such orphan genes in the nematode model organism Pristionchus pacificus. Here, we employ large-scale transcriptomics to establish potential functional associations and to measure the degree of transcriptional plasticity among orphan genes. Specifically, we analyzed 24 RNA-seq samples from adult P. pacificus worms raised on 24 different monoxenic bacterial cultures. Based on coexpression analysis, we identified 28 large modules that harbor 3,727 diplogastrid-specific orphan genes and that respond dynamically to different bacteria. These coexpression modules have distinct regulatory architecture and also exhibit differential expression patterns across development suggesting a link between bacterial response networks and development. Phylostratigraphy revealed a considerably high number of family- and even species-specific orphan genes in certain coexpression modules. This suggests that new genes are not attached randomly to existing cellular networks and that integration can happen very fast. Integrative analysis of protein domains, gene expression and ortholog data facilitated the assignments of biological labels for 22 coexpression modules with one of the largest, fast-evolving module being associated with spermatogenesis. In summary, this work presents the first functional annotation for thousands of P. pacificus orphan genes and reveals insights into their integration into environmentally responsive gene networks.
Collapse
Affiliation(s)
- Marina Athanasouli
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Nermin Akduman
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Waltraud Röseler
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Penghieng Theam
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Christian Rödelsperger
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
34
|
Nur M, Wood K, Michelmore R. EffectorO: Motif-Independent Prediction of Effectors in Oomycete Genomes Using Machine Learning and Lineage Specificity. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2023; 36:397-410. [PMID: 36853198 DOI: 10.1094/mpmi-11-22-0236-ta] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Oomycete plant pathogens cause a wide variety of diseases, including late blight of potato, sudden oak death, and downy mildews of plants. These pathogens are major contributors to loss in numerous food crops. Oomycetes secrete effector proteins to manipulate their hosts to the advantage of the pathogen. Plants have evolved to recognize effectors, resulting in an evolutionary cycle of defense and counter-defense in plant-microbe interactions. This selective pressure results in highly diverse effector sequences that can be difficult to computationally identify using only sequence similarity. We developed a novel effector prediction tool, EffectorO, that uses two complementary approaches to predict effectors in oomycete pathogen genomes: i) a machine learning-based pipeline that predicts effector probability based on the biochemical properties of the N-terminal amino-acid sequence of a protein and ii) a pipeline based on lineage specificity to find proteins that are unique to one species or genus, a sign of evolutionary divergence due to adaptation to the host. We tested EffectorO on Bremia lactucae, which causes lettuce downy mildew, and Phytophthora infestans, which causes late blight of potato and tomato, and predicted many novel effector candidates while recovering the majority of known effector candidates. EffectorO will be useful for discovering novel families of oomycete effectors without relying on sequence similarity to known effectors. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Munir Nur
- The Genome Center, University of California, Davis, CA, U.S.A
| | - Kelsey Wood
- The Genome Center, University of California, Davis, CA, U.S.A
- Integrative Genetics & Genomics Graduate Group, University of California, Davis, CA, U.S.A
| | - Richard Michelmore
- The Genome Center, University of California, Davis, CA, U.S.A
- Departments of Plant Sciences, Molecular & Cellular Biology, Medical Microbiology & Immunology, University of California, Davis, CA, U.S.A
| |
Collapse
|
35
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
36
|
Manley BF, Lotharukpong JS, Barrera-Redondo J, Llewellyn T, Yildirir G, Sperschneider J, Corradi N, Paszkowski U, Miska EA, Dallaire A. A highly contiguous genome assembly reveals sources of genomic novelty in the symbiotic fungus Rhizophagus irregularis. G3 (BETHESDA, MD.) 2023; 13:jkad077. [PMID: 36999556 PMCID: PMC10234402 DOI: 10.1093/g3journal/jkad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 03/17/2023] [Indexed: 06/02/2023]
Abstract
The root systems of most plant species are aided by the soil-foraging capacities of symbiotic arbuscular mycorrhizal (AM) fungi of the Glomeromycotina subphylum. Despite recent advances in our knowledge of the ecology and molecular biology of this mutualistic symbiosis, our understanding of the AM fungi genome biology is just emerging. Presented here is a close to T2T genome assembly of the model AM fungus Rhizophagus irregularis DAOM197198, achieved through Nanopore long-read DNA sequencing and Hi-C data. This haploid genome assembly of R. irregularis, alongside short- and long-read RNA-Sequencing data, was used to produce a comprehensive annotation catalog of gene models, repetitive elements, small RNA loci, and DNA cytosine methylome. A phylostratigraphic gene age inference framework revealed that the birth of genes associated with nutrient transporter activity and transmembrane ion transport systems predates the emergence of Glomeromycotina. While nutrient cycling in AM fungi relies on genes that existed in ancestor lineages, a burst of Glomeromycotina-restricted genetic innovation is also detected. Analysis of the chromosomal distribution of genetic and epigenetic features highlights evolutionarily young genomic regions that produce abundant small RNAs, suggesting active RNA-based monitoring of genetic sequences surrounding recently evolved genes. This chromosome-scale view of the genome of an AM fungus genome reveals previously unexplored sources of genomic novelty in an organism evolving under an obligate symbiotic life cycle.
Collapse
Affiliation(s)
- Bethan F Manley
- SPUN|Society for the Protection of Underground Networks, 3500 South DuPont Highway, Suite EI-101, Dover, DE 19901, USA
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
| | - Jaruwatana S Lotharukpong
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, Tübingen 72076, Germany
| | - Josué Barrera-Redondo
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, Tübingen 72076, Germany
| | - Theo Llewellyn
- Comparative Fungal Biology, Royal Botanic Gardens Kew, Jodrell Laboratory, Richmond TW9 3DS, UK
- Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Gokalp Yildirir
- Department of Biology, University of Ottawa, Ottawa, ON, Canada K1N 6N5
| | - Jana Sperschneider
- Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT 2601, Australia
| | - Nicolas Corradi
- Department of Biology, University of Ottawa, Ottawa, ON, Canada K1N 6N5
| | - Uta Paszkowski
- Crop Science Centre, Department of Plant Sciences, University of Cambridge, Cambridge CB3 0LE, UK
| | - Eric A Miska
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, UK
| | - Alexandra Dallaire
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Comparative Fungal Biology, Royal Botanic Gardens Kew, Jodrell Laboratory, Richmond TW9 3DS, UK
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, UK
| |
Collapse
|
37
|
Grandchamp A, Kühl L, Lebherz M, Brüggemann K, Parsch J, Bornberg-Bauer E. Population genomics reveals mechanisms and dynamics of de novo expressed open reading frame emergence in Drosophila melanogaster. Genome Res 2023; 33:872-890. [PMID: 37442576 PMCID: PMC10519401 DOI: 10.1101/gr.277482.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 06/06/2023] [Indexed: 07/15/2023]
Abstract
Novel genes are essential for evolutionary innovations and differ substantially even between closely related species. Recently, multiple studies across many taxa showed that some novel genes arise de novo, that is, from previously noncoding DNA. To characterize the underlying mutations that allowed de novo gene emergence and their order of occurrence, homologous regions must be detected within noncoding sequences in closely related sister genomes. So far, most studies do not detect noncoding homologs of de novo genes because of incomplete assemblies and annotations, and long evolutionary distances separating genomes. Here, we overcome these issues by searching for de novo expressed open reading frames (neORFs), the not-yet fixed precursors of de novo genes that emerged within a single species. We sequenced and assembled genomes with long-read technology and the corresponding transcriptomes from inbred lines of Drosophila melanogaster, derived from seven geographically diverse populations. We found line-specific neORFs in abundance but few neORFs shared by lines, suggesting a rapid turnover. Gain and loss of transcription is more frequent than the creation of ORFs, for example, by forming new start and stop codons. Consequently, the gain of ORFs becomes rate limiting and is frequently the initial step in neORFs emergence. Furthermore, transposable elements (TEs) are major drivers for intragenomic duplications of neORFs, yet TE insertions are less important for the emergence of neORFs. However, highly mutable genomic regions around TEs provide new features that enable gene birth. In conclusion, neORFs have a high birth-death rate, are rapidly purged, but surviving neORFs spread neutrally through populations and within genomes.
Collapse
Affiliation(s)
- Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany;
| | - Lucas Kühl
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Marie Lebherz
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Kathrin Brüggemann
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - John Parsch
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Munich, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
- Max Planck Institute for Biology Tübingen, Department of Protein Evolution, 72076 Tübingen, Germany
| |
Collapse
|
38
|
Sanejouand YH. On the Unknown Proteins of Eukaryotic Proteomes. J Mol Evol 2023:10.1007/s00239-023-10116-1. [PMID: 37219573 DOI: 10.1007/s00239-023-10116-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/07/2023] [Indexed: 05/24/2023]
Abstract
To study unknown proteins on a large scale, a reference system has been set up for the three better studied eukaryotic kingdoms, built with 36 proteomes as taxonomically diverse as possible. Proteins from 362 other eukaryotic proteomes with no known homologue in this set were then analyzed, focusing noteworthy on singletons, that is, on such proteins with no known homologue in their own proteome. Consistently, for a given species, no more than 12% of the singletons thus found are known at the protein level, according to Uniprot. In addition, since they rely on the information found in the alignment of homologous sequences, predictions of AlphaFold2 for their tridimensional structure are poor. In the case of metazoan species, the number of singletons rarely exceeds 1000 for the species the closest to the reference system (divergence times below 75 Myr). Interestingly, in the cases of viridiplantae and fungi, larger amounts of singletons are found for such species, as if the timescale on which singletons are added to proteomes were different in metazoa and in other eukaryotic kingdoms. In order to confirm this phenomenon, further studies of proteomes closer to those of the reference system are, however, needed.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, rue de la Houssinière, 44322, Nantes, France.
| |
Collapse
|
39
|
Budzynski L, Pagnani A. Small-coupling expansion for multiple sequence alignment. Phys Rev E 2023; 107:044125. [PMID: 37198812 DOI: 10.1103/physreve.107.044125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 03/27/2023] [Indexed: 05/19/2023]
Abstract
The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional or structural characterizations between homologous sequences in different organisms. Typically, state-of-the-art bioinformatics tools are based on profile models that assume the statistical independence of the different sites of the sequences. Over the last years, it has become increasingly clear that homologous sequences show complex patterns of long-range correlations over the primary sequence as a consequence of the natural evolution process that selects genetic variants under the constraint of preserving the functional or structural determinants of the sequence. Here, we present an alignment algorithm based on message passing techniques that overcomes the limitations of profile models. Our method is based on a perturbative small-coupling expansion of the free energy of the model that assumes a linear chain approximation as the zeroth-order of the expansion. We test the potentiality of the algorithm against standard competing strategies on several biological sequences.
Collapse
Affiliation(s)
- Louise Budzynski
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 24, I-10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060, Candiolo, Italy
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 24, I-10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060, Candiolo, Italy
- INFN, Sezione di Torino, Torino, Via Pietro Giuria, 1 10125 Torino Italy
| |
Collapse
|
40
|
Barrera-Redondo J, Lotharukpong JS, Drost HG, Coelho SM. Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biol 2023; 24:54. [PMID: 36964572 PMCID: PMC10037820 DOI: 10.1186/s13059-023-02895-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 03/10/2023] [Indexed: 03/26/2023] Open
Abstract
We present GenEra ( https://github.com/josuebarrera/GenEra ), a DIAMOND-fueled gene-family founder inference framework that addresses previously raised limitations and biases in genomic phylostratigraphy, such as homology detection failure. GenEra also reduces computational time from several months to a few days for any genome of interest. We analyze the emergence of taxonomically restricted gene families during major evolutionary transitions in plants, animals, and fungi. Our results indicate that the impact of homology detection failure on inferred patterns of gene emergence is lineage-dependent, suggesting that plants are more prone to evolve novelty through the emergence of new genes compared to animals and fungi.
Collapse
Affiliation(s)
- Josué Barrera-Redondo
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Jaruwatana Sodai Lotharukpong
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Department of Molecular Biology, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Susana M Coelho
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| |
Collapse
|
41
|
Lopez-Anido RN, Batzel GO, Ramirez G, Goodheart JA, Wang Y, Neal S, Lyons DC. Spatial-temporal expression analysis of lineage-restricted shell matrix proteins reveals shell field regionalization and distinct cell populations in the slipper snail Crepidula atrasolea. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.18.532128. [PMID: 36993573 PMCID: PMC10055211 DOI: 10.1101/2023.03.18.532128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Molluscs are one of the most morphologically diverse clades of metazoans, exhibiting an immense diversification of calcium carbonate structures, such as the shell. Biomineralization of the calcified shell is dependent on shell matrix proteins (SMPs). While SMP diversity is hypothesized to drive molluscan shell diversity, we are just starting to unravel SMP evolutionary history and biology. Here we leveraged two complementary model mollusc systems, Crepidula fornicata and Crepidula atrasolea , to determine the lineage-specificity of 185 Crepidula SMPs. We found that 95% of the adult C. fornicata shell proteome belongs to conserved metazoan and molluscan orthogroups, with molluscan-restricted orthogroups containing half of all SMPs in the shell proteome. The low number of C. fornicata -restricted SMPs contradicts the generally-held notion that an animal’s biomineralization toolkit is dominated by mostly novel genes. Next, we selected a subset of lineage-restricted SMPs for spatial-temporal analysis using in situ hybridization chain reaction (HCR) during larval stages in C. atrasolea . We found that 12 out of 18 SMPs analyzed are expressed in the shell field. Notably, these genes are present in 5 expression patterns, which define at least three distinct cell populations within the shell field. These results represent the most comprehensive analysis of gastropod SMP evolutionary age and shell field expression patterns to date. Collectively, these data lay the foundation for future work to interrogate the molecular mechanisms and cell fate decisions underlying molluscan mantle specification and diversification.
Collapse
|
42
|
Sandmann CL, Schulz JF, Ruiz-Orera J, Kirchner M, Ziehm M, Adami E, Marczenke M, Christ A, Liebe N, Greiner J, Schoenenberger A, Muecke MB, Liang N, Moritz RL, Sun Z, Deutsch EW, Gotthardt M, Mudge JM, Prensner JR, Willnow TE, Mertins P, van Heesch S, Hubner N. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol Cell 2023; 83:994-1011.e18. [PMID: 36806354 PMCID: PMC10032668 DOI: 10.1016/j.molcel.2023.01.023] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/12/2022] [Accepted: 01/25/2023] [Indexed: 02/19/2023]
Abstract
All species continuously evolve short open reading frames (sORFs) that can be templated for protein synthesis and may provide raw materials for evolutionary adaptation. We analyzed the evolutionary origins of 7,264 recently cataloged human sORFs and found that most were evolutionarily young and had emerged de novo. We additionally identified 221 previously missed sORFs potentially translated into peptides of up to 15 amino acids-all of which are smaller than the smallest human microprotein annotated to date. To investigate the bioactivity of sORF-encoded small peptides and young microproteins, we subjected 266 candidates to a mass-spectrometry-based interactome screen with motif resolution. Based on these interactomes and additional cellular assays, we can associate several candidates with mRNA splicing, translational regulation, and endocytosis. Our work provides insights into the evolutionary origins and interaction potential of young and small proteins, thereby helping to elucidate this underexplored territory of the human proteome.
Collapse
Affiliation(s)
- Clara-L Sandmann
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jana F Schulz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jorge Ruiz-Orera
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Marieluise Kirchner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Matthias Ziehm
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Eleonora Adami
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Maike Marczenke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Annabel Christ
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Nina Liebe
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Johannes Greiner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Aaron Schoenenberger
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michael B Muecke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Ning Liang
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Michael Gotthardt
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John R Prensner
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Pediatric Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Thomas E Willnow
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark
| | - Philipp Mertins
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | | | - Norbert Hubner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany.
| |
Collapse
|
43
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
44
|
Karlowski WM, Varshney D, Zielezinski A. Taxonomically Restricted Genes in Bacillus may Form Clusters of Homologs and Can be Traced to a Large Reservoir of Noncoding Sequences. Genome Biol Evol 2023; 15:7039703. [PMID: 36790099 PMCID: PMC10003748 DOI: 10.1093/gbe/evad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/09/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Taxonomically restricted genes (TRGs) are unique for a defined group of organisms and may act as potential genetic determinants of lineage-specific, biological properties. Here, we explore the TRGs of highly diverse and economically important Bacillus bacteria by examining commonly used TRG identification parameters and data sources. We show the significant effects of sequence similarity thresholds, composition, and the size of the reference database in the identification process. Subsequently, we applied stringent TRG search parameters and expanded the identification procedure by incorporating an analysis of noncoding and non-syntenic regions of non-Bacillus genomes. A multiplex annotation procedure minimized the number of false-positive TRG predictions and showed nearly one-third of the alleged TRGs could be mapped to genes missed in genome annotations. We traced the putative origin of TRGs by identifying homologous, noncoding genomic regions in non-Bacillus species and detected sequence changes that could transform these regions into protein-coding genes. In addition, our analysis indicated that Bacillus TRGs represent a specific group of genes mostly showing intermediate sequence properties between genes that are conserved across multiple taxa and nonannotated peptides encoded by open reading frames.
Collapse
Affiliation(s)
- Wojciech M Karlowski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Deepti Varshney
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Andrzej Zielezinski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| |
Collapse
|
45
|
Kilinc M, Jia K, Jernigan RL. Improved global protein homolog detection with major gains in function identification. Proc Natl Acad Sci U S A 2023; 120:e2211823120. [PMID: 36827259 PMCID: PMC9992864 DOI: 10.1073/pnas.2211823120] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 01/20/2023] [Indexed: 02/25/2023] Open
Abstract
There are several hundred million protein sequences, but the relationships among them are not fully available from existing homolog detection methods. There is an essential need for an improved method to push homolog detection to lower levels of sequence identity. The method used here relies on a language model to represent proteins numerically in a matrix (an embedding) and uses discrete cosine transforms to compress the data to extract the most essential part, significantly reducing the data size. This PRotein Ortholog Search Tool (PROST) is significantly faster with linear runtimes, and most importantly, computes the distances between pairs of protein sequences to yield homologs at significantly lower levels of sequence identity than previously. The extent of allosteric effects in proteins points out the importance of global aspects of structure and sequence. PROST excels at global homology detection but not at detecting local homologs. Results are validated by strong similarities between the corresponding pairs of structures. The number of remote homologs detected increased significantly and pushes the effective sequence matches more deeply into the twilight zone. Human protein sequences presently having no assigned function now find significant numbers of putative homologs for 93% of cases and structurally verified assigned functions for 76.4% of these cases. The data compression enables massive searches for homologs with short search times while yielding significant gains in the numbers of remote homologs detected. The method is sufficiently efficient to permit whole-genome/proteome comparisons. The PROST web server is accessible at https://mesihk.github.io/prost.
Collapse
Affiliation(s)
- Mesih Kilinc
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA50011
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA50011
| | - Robert L. Jernigan
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA50011
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA50011
| |
Collapse
|
46
|
Chen Y, Ma T, Zhang T, Ma L. Trends in the evolution of intronless genes in Poaceae. FRONTIERS IN PLANT SCIENCE 2023; 14:1065631. [PMID: 36875616 PMCID: PMC9978806 DOI: 10.3389/fpls.2023.1065631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 02/01/2023] [Indexed: 06/18/2023]
Abstract
Intronless genes (IGs), which are a feature of prokaryotes, are a fascinating group of genes that are also present in eukaryotes. In the current study, a comparison of Poaceae genomes revealed that the origin of IGs may have involved ancient intronic splicing, reverse transcription, and retrotranspositions. Additionally, IGs exhibit the typical features of rapid evolution, including recent duplications, variable copy numbers, low divergence between paralogs, and high non-synonymous to synonymous substitution ratios. By tracing IG families along the phylogenetic tree, we determined that the evolutionary dynamics of IGs differed among Poaceae subfamilies. IG families developed rapidly before the divergence of Pooideae and Oryzoideae and expanded slowly after the divergence. In contrast, they emerged gradually and consistently in the Chloridoideae and Panicoideae clades during evolution. Furthermore, IGs are expressed at low levels. Under relaxed selection pressure, retrotranspositions, intron loss, and gene duplications and conversions may promote the evolution of IGs. The comprehensive characterization of IGs is critical for in-depth studies on intron functions and evolution as well as for assessing the importance of introns in eukaryotes.
Collapse
Affiliation(s)
- Yong Chen
- *Correspondence: Tingting Zhang, ; Lei Ma,
| | | | | | - Lei Ma
- *Correspondence: Tingting Zhang, ; Lei Ma,
| |
Collapse
|
47
|
Chang CH, Mejia Natividad I, Malik HS. Expansion and loss of sperm nuclear basic protein genes in Drosophila correspond with genetic conflicts between sex chromosomes. eLife 2023; 12:85249. [PMID: 36763410 PMCID: PMC9917458 DOI: 10.7554/elife.85249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/04/2023] [Indexed: 02/11/2023] Open
Abstract
Many animal species employ sperm nuclear basic proteins (SNBPs) or protamines to package sperm genomes tightly. SNBPs vary across animal lineages and evolve rapidly in mammals. We used a phylogenomic approach to investigate SNBP diversification in Drosophila species. We found that most SNBP genes in Drosophila melanogaster evolve under positive selection except for genes essential for male fertility. Unexpectedly, evolutionarily young SNBP genes are more likely to be critical for fertility than ancient, conserved SNBP genes. For example, CG30056 is dispensable for male fertility despite being one of three SNBP genes universally retained in Drosophila species. We found 19 independent SNBP gene amplification events that occurred preferentially on sex chromosomes. Conversely, the montium group of Drosophila species lost otherwise-conserved SNBP genes, coincident with an X-Y chromosomal fusion. Furthermore, SNBP genes that became linked to sex chromosomes via chromosomal fusions were more likely to degenerate or relocate back to autosomes. We hypothesize that autosomal SNBP genes suppress meiotic drive, whereas sex-chromosomal SNBP expansions lead to meiotic drive. X-Y fusions in the montium group render autosomal SNBPs dispensable by making X-versus-Y meiotic drive obsolete or costly. Thus, genetic conflicts between sex chromosomes may drive SNBP rapid evolution during spermatogenesis in Drosophila species.
Collapse
Affiliation(s)
- Ching-Ho Chang
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, United States
| | - Isabel Mejia Natividad
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, United States.,Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, Seattle, United States
| | - Harmit S Malik
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, United States.,Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, Seattle, United States
| |
Collapse
|
48
|
Kimmel J, Schmitt M, Sinner A, Jansen PWTC, Mainye S, Ramón-Zamorano G, Toenhake CG, Wichers-Misterek JS, Cronshagen J, Sabitzki R, Mesén-Ramírez P, Behrens HM, Bártfai R, Spielmann T. Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell Syst 2023; 14:9-23.e7. [PMID: 36657393 DOI: 10.1016/j.cels.2022.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/07/2022] [Accepted: 12/08/2022] [Indexed: 01/19/2023]
Abstract
Taxon-specific proteins are key determinants defining the biology of all organisms and represent prime drug targets in pathogens. However, lacking comparability with proteins in other lineages makes them particularly difficult to study. In malaria parasites, this is exacerbated by technical limitations. Here, we analyzed the cellular location, essentiality, function, and, in selected cases, interactome of all unknown non-secretory proteins encoded on an entire P. falciparum chromosome. The nucleus was the most common localization, indicating that it is a hotspot of parasite-specific biology. More in-depth functional studies with four proteins revealed essential roles in DNA replication and mitosis. The mitosis proteins defined a possible orphan complex and a highly diverged complex needed for spindle-kinetochore connection. Structure-function comparisons indicated that the taxon-specific proteins evolved by different mechanisms. This work demonstrates the feasibility of gene-by-gene screens to elucidate the biology of malaria parasites and reveal critical parasite-specific processes of interest as drug targets.
Collapse
Affiliation(s)
- Jessica Kimmel
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Marius Schmitt
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Alexej Sinner
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | | | - Sheila Mainye
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Gala Ramón-Zamorano
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Christa Geeke Toenhake
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, the Netherlands
| | | | - Jakob Cronshagen
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Ricarda Sabitzki
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Paolo Mesén-Ramírez
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Hannah Michaela Behrens
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Richárd Bártfai
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, the Netherlands
| | - Tobias Spielmann
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany.
| |
Collapse
|
49
|
Metivier JC, Chain FJJ. Diversity in Expression Biases of Lineage-Specific Genes During Development and Anhydrobiosis Among Tardigrade Species. Evol Bioinform Online 2022; 18:11769343221140277. [PMID: 36578471 PMCID: PMC9791283 DOI: 10.1177/11769343221140277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 10/27/2022] [Indexed: 12/24/2022] Open
Abstract
Lineage-specific genes can contribute to the emergence and evolution of novel traits and adaptations. Tardigrades are animals that have adapted to tolerate extreme conditions by undergoing a form of cryptobiosis called anhydrobiosis, a physical transformation to an inactive desiccated state. While studies to understand the genetics underlying the interspecies diversity in anhydrobiotic transitions have identified tardigrade-specific genes and family expansions involved in this process, the contributions of species-specific genes to the variation in tardigrade development and cryptobiosis are less clear. We used previously published transcriptomes throughout development and anhydrobiosis (5 embryonic stages, 7 juvenile stages, active adults, and tun adults) to assess the transcriptional biases of different classes of genes between 2 tardigrade species, Hypsibius exemplaris and Ramazzottius varieornatus. We also used the transcriptomes of 2 other tardigrades, Echiniscoides sigismundi and Richtersius coronifer, and data from 3 non-tardigrade species (Adenita vaga, Drosophila melanogaster, and Caenorhabditis elegans) to help identify lineage-specific genes. We found that lineage-specific genes have generally low and narrow expression but are enriched among biased genes in different stages of development depending on the species. Biased genes tend to be specific to early and late development, but there is little overlap in functional enrichment of biased genes between species. Gene expansions in the 2 tardigrades also involve families with different functions despite homologous genes being expressed during anhydrobiosis in both species. Our results demonstrate the interspecific variation in transcriptional contributions and biases of lineage-specific genes during development and anhydrobiosis in 2 tardigrades.
Collapse
Affiliation(s)
| | - Frédéric J J Chain
- Frédéric J J Chain, Department of Biological Sciences, University of Massachusetts Lowell, One University Ave, Lowell, MA 01854, USA.
| |
Collapse
|
50
|
Wiberg RAW, Viktorin G, Schärer L. Mating strategy predicts gene presence/absence patterns in a genus of simultaneously hermaphroditic flatworms. Evolution 2022; 76:3054-3066. [PMID: 36199200 PMCID: PMC10092323 DOI: 10.1111/evo.14635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 09/28/2022] [Indexed: 01/22/2023]
Abstract
Gene repertoire turnover is a characteristic of genome evolution. However, we lack well-replicated analyses of presence/absence patterns associated with different selection contexts. Here, we study ∼100 transcriptome assemblies across Macrostomum, a genus of simultaneously hermaphroditic flatworms exhibiting multiple convergent shifts in mating strategy and associated reproductive morphologies. Many species mate reciprocally, with partners donating and receiving sperm at the same time. Other species convergently evolved to mate by hypodermic injection of sperm into the partner. We find that for orthologous transcripts annotated as expressed in the body region containing the testes, sequences from hypodermically inseminating species diverge more rapidly from the model species, Macrostomum lignano, and have a lower probability of being observed in other species. For other annotation categories, simpler models with a constant rate of similarity decay with increasing genetic distance from M. lignano match the observed patterns well. Thus, faster rates of sequence evolution for hypodermically inseminating species in testis-region genes result in higher rates of homology detection failure, yielding a signal of rapid evolution in sequence presence/absence patterns. Our results highlight the utility of considering appropriate null models for unobserved genes, as well as associating patterns of gene presence/absence with replicated evolutionary events in a phylogenetic context.
Collapse
Affiliation(s)
- R Axel W Wiberg
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, CH-4051, Switzerland.,Evolutionary Biology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-75236, Sweden
| | - Gudrun Viktorin
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, CH-4051, Switzerland
| | - Lukas Schärer
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, CH-4051, Switzerland
| |
Collapse
|