1
|
Aubel M, Buchel F, Heames B, Jones A, Honc O, Bornberg-Bauer E, Hlouchova K. High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential. Genome Biol Evol 2024; 16:evae069. [PMID: 38597156 PMCID: PMC11024478 DOI: 10.1093/gbe/evae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/23/2024] [Indexed: 04/11/2024] Open
Abstract
De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Filip Buchel
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Department of Biochemistry, Faculty of Science, Charles University, Prague, Czech Republic
| | - Brennen Heames
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Alun Jones
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Ondrej Honc
- Imaging Methods Core Facility, BIOCEV, Prague, Czech Republic
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
- Department of Protein Evolution, Max Planck-Institute for Biology Tuebingen, Tuebingen, Germany
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
| |
Collapse
|
2
|
Coronado-Zamora M, González J. Transposons contribute to the functional diversification of the head, gut, and ovary transcriptomes across Drosophila natural strains. Genome Res 2023; 33:1541-1553. [PMID: 37793782 PMCID: PMC10620055 DOI: 10.1101/gr.277565.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 08/08/2023] [Indexed: 10/06/2023]
Abstract
Transcriptomes are dynamic, with cells, tissues, and body parts expressing particular sets of transcripts. Transposable elements (TEs) are a known source of transcriptome diversity; however, studies often focus on a particular type of chimeric transcript, analyze single body parts or cell types, or are based on incomplete TE annotations from a single reference genome. In this work, we have implemented a method based on de novo transcriptome assembly that minimizes the potential sources of errors while identifying a comprehensive set of gene-TE chimeras. We applied this method to the head, gut, and ovary dissected from five Drosophila melanogaster natural strains, with individual reference genomes available. We found that ∼19% of body part-specific transcripts are gene-TE chimeras. Overall, chimeric transcripts contribute a mean of 43% to the total gene expression, and they provide protein domains for DNA binding, catalytic activity, and DNA polymerase activity. Our comprehensive data set is a rich resource for follow-up analysis. Moreover, because TEs are present in virtually all species sequenced to date, their role in spatially restricted transcript expression is likely not exclusive to the species analyzed in this work.
Collapse
Affiliation(s)
| | - Josefa González
- Institute of Evolutionary Biology, CSIC, UPF, Barcelona 08003, Spain
| |
Collapse
|
3
|
Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023; 91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]
Abstract
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Collapse
Affiliation(s)
- Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
4
|
Domain Evolution of Vertebrate Blood Coagulation Cascade Proteins. J Mol Evol 2022; 90:418-428. [PMID: 36181519 DOI: 10.1007/s00239-022-10071-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 08/26/2022] [Indexed: 10/06/2022]
Abstract
Vertebrate blood coagulation is controlled by a cascade containing more than 20 proteins. The cascade proteins are found in the blood in their zymogen forms and when the cascade is triggered by tissue damage, zymogens are activated and in turn activate their downstream proteins by serine protease activity. In this study, we examined proteomes of 21 chordates, of which 18 are vertebrates, to reveal the modular evolution of the blood coagulation cascade. Additionally, two Arthropoda species were used to compare domain arrangements of the proteins belonging to the hemolymph clotting and the blood coagulation cascades. Within the vertebrate coagulation protein set, almost half of the studied proteins are shared with jawless vertebrates. Domain similarity analyses revealed that there are multiple possible evolutionary trajectories for each coagulation protein. During the evolution of higher vertebrate clades, gene and genome duplications led to the formation of other coagulation cascade proteins.
Collapse
|
5
|
Chenevert M, Miller B, Karkoutli A, Rusnak A, Lott SE, Atallah J. The early embryonic transcriptome of a Hawaiian Drosophila picture-wing fly shows evidence of altered gene expression and novel gene evolution. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2022; 338:277-291. [PMID: 35322942 DOI: 10.1002/jez.b.23129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 01/14/2022] [Accepted: 02/13/2022] [Indexed: 06/14/2023]
Abstract
A massive adaptive radiation on the Hawaiian archipelago has produced approximately one-quarter of the fly species in the family Drosophilidae. The Hawaiian Drosophila clade has long been recognized as a model system for the study of both the ecology of island endemics and the evolution of developmental mechanisms, but relatively few genomic and transcriptomic datasets are available for this group. We present here a differential expression analysis of the transcriptional profiles of two highly conserved embryonic stages in the Hawaiian picture-wing fly Drosophila grimshawi. When we compared our results to previously published datasets across the family Drosophilidae, we identified cases of both gains and losses of gene representation in D. grimshawi, including an apparent delay in Hox gene activation. We also found a high expression of unannotated genes. Most transcripts of unannotated genes with open reading frames do not have identified homologs in non-Hawaiian Drosophila species, although the vast majority have sequence matches in genomes of other Hawaiian picture-wing flies. Some of these unannotated genes may have arisen from noncoding sequence in the ancestor of Hawaiian flies or during the evolution of the clade. Our results suggest that both the modified use of ancestral genes and the evolution of new ones may occur in rapid radiations.
Collapse
Affiliation(s)
- Madeline Chenevert
- Department of Biological Sciences, University of New Orleans, New Orleans, Louisiana, USA
- Hayward Genetics Center, Tulane University School of Medicine, New Orleans, Louisiana, USA
| | - Bronwyn Miller
- Department of Biological Sciences, University of New Orleans, New Orleans, Louisiana, USA
| | - Ahmad Karkoutli
- Department of Biological Sciences, University of New Orleans, New Orleans, Louisiana, USA
- LSUHSC School of Medicine, New Orleans, Louisiana, USA
| | - Anna Rusnak
- Department of Biological Sciences, University of New Orleans, New Orleans, Louisiana, USA
- Center for Biomedical Engineering, Brown University, Box A-2, Arnold Lab, Providence, Rhode Island, USA
| | - Susan E Lott
- Department of Evolution & Ecology, University of California-Davis, Davis, California, USA
| | - Joel Atallah
- Department of Biological Sciences, University of New Orleans, New Orleans, Louisiana, USA
| |
Collapse
|
6
|
Bubnell JE, Ulbing CKS, Fernandez Begne P, Aquadro CF. Functional Divergence of the bag-of-marbles Gene in the Drosophila melanogaster Species Group. Mol Biol Evol 2022; 39:6609986. [PMID: 35714266 PMCID: PMC9250105 DOI: 10.1093/molbev/msac137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
In Drosophila melanogaster, a key germline stem cell (GSC) differentiation factor, bag of marbles (bam) shows rapid bursts of amino acid fixations between sibling species D. melanogaster and Drosophila simulans, but not in the outgroup species Drosophila ananassae. Here, we test the null hypothesis that bam's differentiation function is conserved between D. melanogaster and four additional Drosophila species in the melanogaster species group spanning approximately 30 million years of divergence. Surprisingly, we demonstrate that bam is not necessary for oogenesis or spermatogenesis in Drosophila teissieri nor is bam necessary for spermatogenesis in D. ananassae. Remarkably bam function may change on a relatively short time scale. We further report tests of neutral sequence evolution at bam in additional species of Drosophila and find a positive, but not perfect, correlation between evidence for positive selection at bam and its essential role in GSC regulation and fertility for both males and females. Further characterization of bam function in more divergent lineages will be necessary to distinguish between bam's critical gametogenesis role being newly derived in D. melanogaster, D. simulans, Drosophila yakuba, and D. ananassae females or it being basal to the genus and subsequently lost in numerous lineages.
Collapse
Affiliation(s)
| | - Cynthia K S Ulbing
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | | | | |
Collapse
|
7
|
New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022; 13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open
Abstract
De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes’ properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5′ Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5′ UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5′ UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Collapse
|
8
|
Klein B, Holmér L, Smith KM, Johnson MM, Swain A, Stolp L, Teufel AI, Kleppe AS. A computational exploration of resilience and evolvability of protein-protein interaction networks. Commun Biol 2021; 4:1352. [PMID: 34857859 PMCID: PMC8639913 DOI: 10.1038/s42003-021-02867-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 11/03/2021] [Indexed: 11/09/2022] Open
Abstract
Protein-protein interaction (PPI) networks represent complex intra-cellular protein interactions, and the presence or absence of such interactions can lead to biological changes in an organism. Recent network-based approaches have shown that a phenotype's PPI network's resilience to environmental perturbations is related to its placement in the tree of life; though we still do not know how or why certain intra-cellular factors can bring about this resilience. Here, we explore the influence of gene expression and network properties on PPI networks' resilience. We use publicly available data of PPIs for E. coli, S. cerevisiae, and H. sapiens, where we compute changes in network resilience as new nodes (proteins) are added to the networks under three node addition mechanisms-random, degree-based, and gene-expression-based attachments. By calculating the resilience of the resulting networks, we estimate the effectiveness of these node addition mechanisms. We demonstrate that adding nodes with gene-expression-based preferential attachment (as opposed to random or degree-based) preserves and can increase the original resilience of PPI network in all three species, regardless of gene expression distribution or network structure. These findings introduce a general notion of prospective resilience, which highlights the key role of network structures in understanding the evolvability of phenotypic traits.
Collapse
Affiliation(s)
- Brennan Klein
- Network Science Institute, Northeastern University, Boston, MA, USA. .,Laboratory for the Modeling of Biological and Socio-Technical Systems, Northeastern University, Boston, MA, USA.
| | - Ludvig Holmér
- grid.419684.60000 0001 1214 1861Center for Data Analytics, Stockholm School of Economics, Stockholm, Sweden
| | - Keith M. Smith
- grid.12361.370000 0001 0727 0669Department of Physics and Mathematics, Nottingham Trent University, Nottingham, UK
| | - Mackenzie M. Johnson
- grid.89336.370000 0004 1936 9924Department of Integrative Biology, University of Texas at Austin, Austin, TX USA
| | - Anshuman Swain
- grid.164295.d0000 0001 0941 7177Department of Biology, University of Maryland, College Park, MD USA
| | - Laura Stolp
- grid.7177.60000000084992262Graduate School of Science, University of Amsterdam, Amsterdam, The Netherlands
| | - Ashley I. Teufel
- grid.89336.370000 0004 1936 9924Department of Integrative Biology, University of Texas at Austin, Austin, TX USA ,grid.209665.e0000 0001 1941 1940Santa Fe Institute, Santa Fe, NM USA ,grid.469272.c0000 0001 0180 5693Texas A&M University, San Antonio, San Antonio, TX USA
| | - April S. Kleppe
- grid.5949.10000 0001 2172 9288Institute for Evolution and Biodiversity, University of Münster, Münster, Germany ,grid.7048.b0000 0001 1956 2722Department of Clinical Medicine (MOMA), Aarhus University, Aarhus, Denmark
| |
Collapse
|
9
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
10
|
Menger FM, Rizvi SAA. Evolution of Complexity. Molecular Aspects of Preassembly. Molecules 2021; 26:6618. [PMID: 34771027 PMCID: PMC8587518 DOI: 10.3390/molecules26216618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/29/2021] [Accepted: 10/29/2021] [Indexed: 11/16/2022] Open
Abstract
An extension of neo-Darwinism, termed preassembly, states that genetic material required for many complex traits, such as echolocation, was present long before emergence of the traits. Assembly of genes and gene segments had occurred over protracted time-periods within large libraries of non-coding genes. Epigenetic factors ultimately promoted transfers from noncoding to coding genes, leading to abrupt formation of the trait via de novo genes. This preassembly model explains many observations that to this present day still puzzle biologists: formation of super-complexity in the absence of multiple fossil precursors, as with bat echolocation and flowering plants; major genetic and physical alterations occurring in just a few thousand years, as with housecat evolution; lack of precursors preceding lush periods of species expansion, as in the Cambrian explosion; and evolution of costly traits that exceed their need during evolutionary times, as with human intelligence. What follows in this paper is a mechanism that is not meant to supplant neo-Darwinism; instead, preassembly aims to supplement current ideas when complexity issues leave them struggling.
Collapse
Affiliation(s)
| | - Syed A. A. Rizvi
- School of Pharmacy, Hampton University, Hampton, VA 23669, USA; or
| |
Collapse
|
11
|
Lange A, Patel PH, Heames B, Damry AM, Saenger T, Jackson CJ, Findlay GD, Bornberg-Bauer E. Structural and functional characterization of a putative de novo gene in Drosophila. Nat Commun 2021; 12:1667. [PMID: 33712569 PMCID: PMC7954818 DOI: 10.1038/s41467-021-21667-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/03/2021] [Indexed: 11/26/2022] Open
Abstract
Comparative genomic studies have repeatedly shown that new protein-coding genes can emerge de novo from noncoding DNA. Still unknown is how and when the structures of encoded de novo proteins emerge and evolve. Combining biochemical, genetic and evolutionary analyses, we elucidate the function and structure of goddard, a gene which appears to have evolved de novo at least 50 million years ago within the Drosophila genus. Previous studies found that goddard is required for male fertility. Here, we show that Goddard protein localizes to elongating sperm axonemes and that in its absence, elongated spermatids fail to undergo individualization. Combining modelling, NMR and circular dichroism (CD) data, we show that Goddard protein contains a large central α-helix, but is otherwise partially disordered. We find similar results for Goddard's orthologs from divergent fly species and their reconstructed ancestral sequences. Accordingly, Goddard's structure appears to have been maintained with only minor changes over millions of years.
Collapse
Affiliation(s)
- Andreas Lange
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Prajal H Patel
- Department of Biology, College of the Holy Cross, Worcester, MA, USA
| | - Brennen Heames
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Adam M Damry
- Research School of Chemistry, ANU College of Science, Canberra, Australia
| | - Thorsten Saenger
- Department of Pediatric Kidney, Liver and Metabolic Diseases, Hannover Medical School, Hannover, Germany
| | - Colin J Jackson
- Research School of Chemistry, ANU College of Science, Canberra, Australia
| | | | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| |
Collapse
|
12
|
Puntambekar S, Newhouse R, San-Miguel J, Chauhan R, Vernaz G, Willis T, Wayland MT, Umrania Y, Miska EA, Prabakaran S. Evolutionary divergence of novel open reading frames in cichlids speciation. Sci Rep 2020; 10:21570. [PMID: 33299045 PMCID: PMC7726158 DOI: 10.1038/s41598-020-78555-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/26/2020] [Indexed: 01/02/2023] Open
Abstract
Novel open reading frames (nORFs) with coding potential may arise from noncoding DNA. Not much is known about their emergence, functional role, fixation in a population or contribution to adaptive radiation. Cichlids fishes exhibit extensive phenotypic diversification and speciation. Encounters with new environments alone are not sufficient to explain this striking diversity of cichlid radiation because other taxa coexistent with the Cichlidae demonstrate lower species richness. Wagner et al. analyzed cichlid diversification in 46 African lakes and reported that both extrinsic environmental factors and intrinsic lineage-specific traits related to sexual selection have strongly influenced the cichlid radiation, which indicates the existence of unknown molecular mechanisms responsible for rapid phenotypic diversification, such as emergence of novel open reading frames (nORFs). In this study, we integrated transcriptomic and proteomic signatures from two tissues of two cichlids species, identified nORFs and performed evolutionary analysis on these nORF regions. Our results suggest that the time scale of speciation of the two species and evolutionary divergence of these nORF genomic regions are similar and indicate a potential role for these nORFs in speciation of the cichlid fishes.
Collapse
Affiliation(s)
- Shraddha Puntambekar
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | - Rachel Newhouse
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Jaime San-Miguel
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Ruchi Chauhan
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Grégoire Vernaz
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
- The Wellcome Trust/CRUK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Thomas Willis
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Matthew T Wayland
- Department of Zoology, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Yagnesh Umrania
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Eric A Miska
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Sudhakaran Prabakaran
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India.
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
- St. Edmund's College, University of Cambridge, Cambridge, CB3 0BN, UK.
| |
Collapse
|
13
|
Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage. Genome Biol Evol 2020; 12:2183-2195. [PMID: 33210146 PMCID: PMC7674706 DOI: 10.1093/gbe/evaa194] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
In addition to known genes, much of the human genome is transcribed into RNA. Chance formation of novel open reading frames (ORFs) can lead to the translation of myriad new proteins. Some of these ORFs may yield advantageous adaptive de novo proteins. However, widespread translation of noncoding DNA can also produce hazardous protein molecules, which can misfold and/or form toxic aggregates. The dynamics of how de novo proteins emerge from potentially toxic raw materials and what influences their long-term survival are unknown. Here, using transcriptomic data from human and five other primates, we generate a set of transcribed human ORFs at six conservation levels to investigate which properties influence the early emergence and long-term retention of these expressed ORFs. As these taxa diverged from each other relatively recently, we present a fine scale view of the evolution of novel sequences over recent evolutionary time. We find that novel human-restricted ORFs are preferentially located on GC-rich gene-dense chromosomes, suggesting their retention is linked to pre-existing genes. Sequence properties such as intrinsic structural disorder and aggregation propensity-which have been proposed to play a role in survival of de novo genes-remain unchanged over time. Even very young sequences code for proteins with low aggregation propensities, suggesting that genomic regions with many novel transcribed ORFs are concomitantly less likely to produce ORFs which code for harmful toxic proteins. Our data indicate that the survival of these novel ORFs is largely stochastic rather than shaped by selection.
Collapse
Affiliation(s)
- Daniel Dowling
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | - Jonathan F Schmitz
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | | |
Collapse
|
14
|
Arendsee Z, Li J, Singh U, Seetharam A, Dorman K, Wurtele ES. phylostratr: a framework for phylostratigraphy. Bioinformatics 2020; 35:3617-3627. [PMID: 30873536 DOI: 10.1093/bioinformatics/btz171] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 02/27/2019] [Accepted: 03/13/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene's phylostratum. RESULTS We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. AVAILABILITY AND IMPLEMENTATION Source code available at https://github.com/arendsee/phylostratr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zebulun Arendsee
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA, USA
| | - Jing Li
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA
| | - Urminder Singh
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA
| | - Arun Seetharam
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Genome Informatics Facility, Iowa State University, Ames, IA, USA
| | - Karin Dorman
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Department of Statistics, Iowa State University, Ames, IA, USA
| | - Eve Syrkin Wurtele
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA, USA
| |
Collapse
|
15
|
Evolution of novel genes in three-spined stickleback populations. Heredity (Edinb) 2020; 125:50-59. [PMID: 32499660 PMCID: PMC7413265 DOI: 10.1038/s41437-020-0319-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 04/27/2020] [Accepted: 04/30/2020] [Indexed: 12/22/2022] Open
Abstract
Eukaryotic genomes frequently acquire new protein-coding genes which may significantly impact an organism’s fitness. Novel genes can be created, for example, by duplication of large genomic regions or de novo, from previously non-coding DNA. Either way, creation of a novel transcript is an essential early step during novel gene emergence. Most studies on the gain-and-loss dynamics of novel genes so far have compared genomes between species, constraining analyses to genes that have remained fixed over long time scales. However, the importance of novel genes for rapid adaptation among populations has recently been shown. Therefore, since little is known about the evolutionary dynamics of transcripts across natural populations, we here study transcriptomes from several tissues and nine geographically distinct populations of an ecological model species, the three-spined stickleback. Our findings suggest that novel genes typically start out as transcripts with low expression and high tissue specificity. Early expression regulation appears to be mediated by gene-body methylation. Although most new and narrowly expressed genes are rapidly lost, those that survive and subsequently spread through populations tend to gain broader and higher expression levels. The properties of the encoded proteins, such as disorder and aggregation propensity, hardly change. Correspondingly, young novel genes are not preferentially under positive selection but older novel genes more often overlap with FST outlier regions. Taken together, expression of the surviving novel genes is rapidly regulated, probably via epigenetic mechanisms, while structural properties of encoded proteins are non-debilitating and might only change much later.
Collapse
|
16
|
Thomas GWC, Dohmen E, Hughes DST, Murali SC, Poelchau M, Glastad K, Anstead CA, Ayoub NA, Batterham P, Bellair M, Binford GJ, Chao H, Chen YH, Childers C, Dinh H, Doddapaneni HV, Duan JJ, Dugan S, Esposito LA, Friedrich M, Garb J, Gasser RB, Goodisman MAD, Gundersen-Rindal DE, Han Y, Handler AM, Hatakeyama M, Hering L, Hunter WB, Ioannidis P, Jayaseelan JC, Kalra D, Khila A, Korhonen PK, Lee CE, Lee SL, Li Y, Lindsey ARI, Mayer G, McGregor AP, McKenna DD, Misof B, Munidasa M, Munoz-Torres M, Muzny DM, Niehuis O, Osuji-Lacy N, Palli SR, Panfilio KA, Pechmann M, Perry T, Peters RS, Poynton HC, Prpic NM, Qu J, Rotenberg D, Schal C, Schoville SD, Scully ED, Skinner E, Sloan DB, Stouthamer R, Strand MR, Szucsich NU, Wijeratne A, Young ND, Zattara EE, Benoit JB, Zdobnov EM, Pfrender ME, Hackett KJ, Werren JH, Worley KC, Gibbs RA, Chipman AD, Waterhouse RM, Bornberg-Bauer E, Hahn MW, Richards S. Gene content evolution in the arthropods. Genome Biol 2020; 21:15. [PMID: 31969194 PMCID: PMC6977273 DOI: 10.1186/s13059-019-1925-7] [Citation(s) in RCA: 106] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 12/26/2019] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Arthropods comprise the largest and most diverse phylum on Earth and play vital roles in nearly every ecosystem. Their diversity stems in part from variations on a conserved body plan, resulting from and recorded in adaptive changes in the genome. Dissection of the genomic record of sequence change enables broad questions regarding genome evolution to be addressed, even across hyper-diverse taxa within arthropods. RESULTS Using 76 whole genome sequences representing 21 orders spanning more than 500 million years of arthropod evolution, we document changes in gene and protein domain content and provide temporal and phylogenetic context for interpreting these innovations. We identify many novel gene families that arose early in the evolution of arthropods and during the diversification of insects into modern orders. We reveal unexpected variation in patterns of DNA methylation across arthropods and examples of gene family and protein domain evolution coincident with the appearance of notable phenotypic and physiological adaptations such as flight, metamorphosis, sociality, and chemoperception. CONCLUSIONS These analyses demonstrate how large-scale comparative genomics can provide broad new insights into the genotype to phenotype map and generate testable hypotheses about the evolution of animal diversity.
Collapse
Affiliation(s)
- Gregg W. C. Thomas
- 0000 0001 0790 959Xgrid.411377.7Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN USA
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münsterss, 48149 Münster, Germany ,0000 0001 2287 2617grid.9026.dInstitute for Bioinformatics and Chemoinformatics, University of Hamburg, Hamburg, Germany ,Westphalian University of Applied Sciences, 45665 Recklinghausen, Germany
| | - Daniel S. T. Hughes
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,0000000419368729grid.21729.3fPresent Address: Institute for Genomic Medicine, Columbia University, New York, NY 10032 USA
| | - Shwetha C. Murali
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,0000000122986657grid.34477.33Present Address: Howard Hughes Medical Institute, Department of Genome Sciences, University of Washington, Seattle, WA 98195 USA
| | - Monica Poelchau
- 0000 0001 2113 2895grid.483014.aNational Agricultural Library, USDA, Beltsville, MD 20705 USA
| | - Karl Glastad
- 0000 0001 2097 4943grid.213917.fSchool of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332 USA ,0000 0004 1936 8972grid.25879.31Present Address: Penn Epigenetics Institute, Department of Cell and Developmental Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Clare A. Anstead
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Nadia A. Ayoub
- grid.268042.aDepartment of Biology, Washington and Lee University, 204 West Washington Street, Lexington, VA 24450 USA
| | - Phillip Batterham
- 0000 0001 2179 088Xgrid.1008.9School of BioSciences Science Faculty, The University of Melbourne, Melbourne, VIC 3010 Australia
| | - Michelle Bellair
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,Present Address: CooperGenomics, Houston, TX USA
| | - Greta J. Binford
- 0000 0004 1936 9043grid.259053.8Department of Biology, Lewis & Clark College, Portland, OR 97219 USA
| | - Hsu Chao
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Yolanda H. Chen
- 0000 0004 1936 7689grid.59062.38Department of Plant and Soil Sciences, University of Vermont, Burlington, USA
| | - Christopher Childers
- 0000 0001 2113 2895grid.483014.aNational Agricultural Library, USDA, Beltsville, MD 20705 USA
| | - Huyen Dinh
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Harsha Vardhan Doddapaneni
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Jian J. Duan
- 0000 0004 0404 0958grid.463419.dBeneficial Insects Introduction Research Unit, United States Department of Agriculture, Agricultural Research Service, Newark, DE USA
| | - Shannon Dugan
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Lauren A. Esposito
- 0000 0004 0461 6769grid.242287.9Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Drive, San Francisco, CA 94118 USA
| | - Markus Friedrich
- 0000 0001 1456 7807grid.254444.7Department of Biological Sciences, Wayne State University, Detroit, MI 48202 USA
| | - Jessica Garb
- 0000 0000 9620 1122grid.225262.3Department of Biological Sciences, University of Massachusetts Lowell, 198 Riverside Street, Lowell, MA 01854 USA
| | - Robin B. Gasser
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Michael A. D. Goodisman
- 0000 0001 2097 4943grid.213917.fSchool of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Dawn E. Gundersen-Rindal
- 0000 0004 0404 0958grid.463419.dUSDA-ARS Invasive Insect Biocontrol and Behavior Laboratory, Beltsville, MD USA
| | - Yi Han
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Alfred M. Handler
- 0000 0004 0404 0958grid.463419.dUSDA-ARS, Center for Medical, Agricultural, and Veterinary Entomology, 1700 S.W. 23rd Drive, Gainesville, FL 32608 USA
| | - Masatsugu Hatakeyama
- 0000 0001 0699 0373grid.410590.9Division of Insect Sciences, National Institute of Agrobiological Sciences, Owashi, Tsukuba, 305-8634 Japan
| | - Lars Hering
- 0000 0001 1089 1036grid.5155.4Department of Zoology, Institute of Biology, University of Kassel, 34132 Kassel, Germany
| | - Wayne B. Hunter
- 0000 0004 0404 0958grid.463419.dUSDA ARS, U. S. Horticultural Research Laboratory, Ft. Pierce, FL 34945 USA
| | - Panagiotis Ioannidis
- 0000 0001 2322 4988grid.8591.5Department of Genetic Medicine and Development and Swiss Institute of Bioinformatics, University of Geneva, 1211 Geneva, Switzerland ,0000 0004 0635 685Xgrid.4834.bPresent Address: Foundation for Research and Technology Hellas, Institute of Molecular Biology and Biotechnology, Vassilika Vouton, 70013 Heraklion, Greece
| | - Joy C. Jayaseelan
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Divya Kalra
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Abderrahman Khila
- 0000 0001 2150 7757grid.7849.2Université de Lyon, Institut de Génomique Fonctionnelle de Lyon, CNRS UMR 5242, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, 46 allée d’Italie, 69364 Lyon, France
| | - Pasi K. Korhonen
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Carol Eunmi Lee
- 0000 0001 0701 8607grid.28803.31Department of Integrative Biology, University of Wisconsin, Madison, WI 53706 USA
| | - Sandra L. Lee
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Yiyuan Li
- 0000 0001 2168 0066grid.131063.6Department of Biological Sciences, University of Notre Dame, 109B Galvin Life Sciences, Notre Dame, IN 46556 USA
| | - Amelia R. I. Lindsey
- 0000 0001 2222 1582grid.266097.cDepartment of Entomology, University of California Riverside, Riverside, CA USA ,0000 0001 0790 959Xgrid.411377.7Present Address: Department of Biology, Indiana University, Bloomington, IN USA
| | - Georg Mayer
- 0000 0001 1089 1036grid.5155.4Department of Zoology, Institute of Biology, University of Kassel, 34132 Kassel, Germany
| | - Alistair P. McGregor
- 0000 0001 0726 8331grid.7628.bDepartment of Biological and Medical Sciences, Oxford Brookes University, Gipsy Lane, Oxford, OX3 0BP UK
| | - Duane D. McKenna
- 0000 0000 9560 654Xgrid.56061.34Department of Biological Sciences, University of Memphis, 3700 Walker Ave, Memphis, TN 38152 USA
| | - Bernhard Misof
- 0000 0001 2216 5875grid.452935.cCenter for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Mala Munidasa
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Monica Munoz-Torres
- 0000 0001 2231 4551grid.184769.5Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, USA ,0000 0004 4665 2899grid.497331.bPresent Address: Phoenix Bioinformatics, 39221 Paseo Padre Parkway, Ste. J., Fremont, CA 94538 USA
| | - Donna M. Muzny
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Oliver Niehuis
- grid.5963.9Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University of Freiburg, 79104 Freiburg (Brsg.), Germany
| | - Nkechinyere Osuji-Lacy
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Subba R. Palli
- 0000 0004 1936 8438grid.266539.dDepartment of Entomology, University of Kentucky, Lexington, KY 40546 USA
| | - Kristen A. Panfilio
- 0000 0000 8809 1613grid.7372.1School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL UK
| | - Matthias Pechmann
- 0000 0000 8580 3777grid.6190.eCologne Biocenter, Zoological Institute, Department of Developmental Biology, University of Cologne, 50674 Cologne, Germany
| | - Trent Perry
- 0000 0001 2179 088Xgrid.1008.9School of BioSciences Science Faculty, The University of Melbourne, Melbourne, VIC 3010 Australia
| | - Ralph S. Peters
- 0000 0001 2216 5875grid.452935.cCentre of Taxonomy and Evolutionary Research, Arthropoda Department, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Helen C. Poynton
- 0000 0004 0386 3207grid.266685.9School for the Environment, University of Massachusetts Boston, Boston, MA 02125 USA
| | - Nikola-Michael Prpic
- 0000 0001 2364 4210grid.7450.6Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Abteilung für Entwicklungsbiologie, Georg-August-Universität Göttingen, Göttingen, Germany ,0000 0001 2364 4210grid.7450.6Göttingen Center for Molecular Biosciences (GZMB), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Jiaxin Qu
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Dorith Rotenberg
- 0000 0001 2173 6074grid.40803.3fDepartment of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27606 USA
| | - Coby Schal
- 0000 0001 2173 6074grid.40803.3fDepartment of Entomology and W.M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695 USA
| | - Sean D. Schoville
- 0000 0001 2167 3675grid.14003.36Department of Entomology, University of Wisconsin-Madison, Madison, USA
| | - Erin D. Scully
- Stored Product Insect and Engineering Research Unit, USDA-ARS Center for Grain and Animal Health Research, Manhattan, KS 66502 USA
| | - Evette Skinner
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Daniel B. Sloan
- 0000 0004 1936 8083grid.47894.36Department of Biology, Colorado State University, Ft. Collins, CO USA
| | - Richard Stouthamer
- 0000 0001 2222 1582grid.266097.cDepartment of Entomology, University of California Riverside, Riverside, CA USA
| | - Michael R. Strand
- 0000 0004 1936 738Xgrid.213876.9Department of Entomology, University of Georgia, Athens, GA USA
| | - Nikolaus U. Szucsich
- 0000 0001 2169 5989grid.252381.fPresent Address: Arkansas Biosciences Institute, Arkansas State University, Jonesboro, AR USA
| | - Asela Wijeratne
- 0000 0000 9560 654Xgrid.56061.34Department of Biological Sciences, University of Memphis, 3700 Walker Ave, Memphis, TN 38152 USA ,0000 0001 2112 4115grid.425585.bNatural History Museum Vienna, Burgring 7, 1010 Vienna, Austria
| | - Neil D. Young
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Eduardo E. Zattara
- 0000 0001 2112 473Xgrid.412234.2INIBIOMA, Univ. Nacional del Comahue – CONICET, Bariloche, Argentina
| | - Joshua B. Benoit
- 0000 0001 2179 9593grid.24827.3bDepartment of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221 USA
| | - Evgeny M. Zdobnov
- 0000 0001 2322 4988grid.8591.5Department of Genetic Medicine and Development and Swiss Institute of Bioinformatics, University of Geneva, 1211 Geneva, Switzerland
| | - Michael E. Pfrender
- 0000 0001 2168 0066grid.131063.6Department of Biological Sciences, University of Notre Dame, 109B Galvin Life Sciences, Notre Dame, IN 46556 USA
| | - Kevin J. Hackett
- 0000 0004 0404 0958grid.463419.dCrop Production and Protection, U.S. Department of Agriculture-Agricultural Research Service, Beltsville, MD 20705 USA
| | - John H. Werren
- 0000 0004 1936 9174grid.16416.34Department of Biology, University of Rochester, Rochester, NY 14627 USA
| | - Kim C. Worley
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Richard A. Gibbs
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Ariel D. Chipman
- 0000 0004 1937 0538grid.9619.7Department of Ecology, Evolution and Behavior, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, 91904 Jerusalem, Israel
| | - Robert M. Waterhouse
- 0000 0001 2165 4204grid.9851.5Department of Ecology & Evolution and Swiss Institute of Bioinformatics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münsterss, 48149 Münster, Germany ,0000 0001 2287 2617grid.9026.dInstitute for Bioinformatics and Chemoinformatics, University of Hamburg, Hamburg, Germany ,0000 0001 1014 8330grid.419495.4Department Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Matthew W. Hahn
- 0000 0001 0790 959Xgrid.411377.7Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN USA
| | - Stephen Richards
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,0000 0004 1936 9684grid.27860.3bPresent Address: UC Davis Genome Center, University of California, Davis, CA 95616 USA
| |
Collapse
|
17
|
Prabh N, Rödelsperger C. De Novo, Divergence, and Mixed Origin Contribute to the Emergence of Orphan Genes in Pristionchus Nematodes. G3 (BETHESDA, MD.) 2019; 9:2277-2286. [PMID: 31088903 PMCID: PMC6643871 DOI: 10.1534/g3.119.400326] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 05/11/2019] [Indexed: 12/30/2022]
Abstract
Homology is a fundamental concept in comparative biology. It is extensively used at the sequence level to make phylogenetic hypotheses and functional inferences. Nonetheless, the majority of eukaryotic genomes contain large numbers of orphan genes lacking homologs in other taxa. Generally, the fraction of orphan genes is higher in genomically undersampled clades, and in the absence of closely related genomes any hypothesis about their origin and evolution remains untestable. Previously, we sequenced ten genomes with an underlying ladder-like phylogeny to establish a phylogenomic framework for studying genome evolution in diplogastrid nematodes. Here, we use this deeply sampled data set to understand the processes that generate orphan genes in our focal species Pristionchus pacificus Based on phylostratigraphic analysis and additional bioinformatic filters, we obtained 29 high-confidence candidate genes for which mechanisms of orphan origin were proposed based on manual inspection. This revealed diverse mechanisms including annotation artifacts, chimeric origin, alternative reading frame usage, and gene splitting with subsequent gain of de novo exons. In addition, we present two cases of complete de novo origination from non-coding regions, which represents one of the first reports of de novo genes in nematodes. Thus, we conclude that de novo emergence, divergence, and mixed mechanisms contribute to novel gene formation in Pristionchus nematodes.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, August Thienemann Str. 2, 24306 Plön, Germany
| | - Christian Rödelsperger
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| |
Collapse
|
18
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
19
|
Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol 2019; 3:679-690. [PMID: 30858588 DOI: 10.1038/s41559-019-0822-5] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 01/23/2019] [Indexed: 12/22/2022]
Abstract
New protein-coding genes that arise de novo from non-coding DNA sequences contribute to protein diversity. However, de novo gene origination is challenging to study as it requires high-quality reference genomes for closely related species, evidence for ancestral non-coding sequences, and transcription and translation of the new genes. High-quality genomes of 13 closely related Oryza species provide unprecedented opportunities to understand de novo origination events. Here, we identify a large number of young de novo genes with discernible recent ancestral non-coding sequences and evidence of translation. Using pipelines examining the synteny relationship between genomes and reciprocal-best whole-genome alignments, we detected at least 175 de novo open reading frames in the focal species O. sativa subspecies japonica, which were all detected in RNA sequencing-based transcriptomes. Mass spectrometry-based targeted proteomics and ribosomal profiling show translational evidence for 57% of the de novo genes. In recent divergence of Oryza, an average of 51.5 de novo genes per million years were generated and retained. We observed evolutionary patterns in which excess indels and early transcription were favoured in origination with a stepwise formation of gene structure. These data reveal that de novo genes contribute to the rapid evolution of protein diversity under positive selection.
Collapse
|
20
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|